双微阵列语音增强算法在说话人识别中的应用

英文篇名：Application of dual-mini microphone array speech enhancement algorithm in speaker recognition
作者：毛维 ; 曾庆宁 ; 龙超
英文作者：MAO Wei;ZENG Qing-ning;LONG Chao;School of Information and Communication, Guilin University of Electronic Technology;
关键词：双微阵列 ; 语音增强 ; 相干滤波 ; 最小方差无畸变响应 ; 改进维纳滤波 ; 说话人识别
英文关键词：dual-mini array;;speech enhancement;;coherence filtering;;minimum variance distortionless response;;modified Wiener filter;;speaker recognition
中文刊名：SXJS
英文刊名：Technical Acoustics
机构：桂林电子科技大学信息与通信学院;
出版日期：2018-06-15
出版单位：声学技术
年：2018
期：v.37
基金：国家自然科学基金项目(61461011);; 教育部重点实验室2016年主任基金项目资助(CRKL160107);; 桂林电子科技大学研究生科研创新项目(2017YJCX16、2017YJCX20)
语种：中文;
页：SXJS201803012
页数：8
CN：03
ISSN：31-1449/TB
分类号：55-62

摘要

针对复杂噪声环境下识别性能显著降低的问题,提出一种用于说话人识别系统前端的双微阵列语音增强算法。该算法采用的是相干滤波和频域宽带最小方差无畸变响应波束形成器后置结合改进的维纳滤波器。其基本原理是首先求出双微麦克风阵列信号中两个相邻通道间的相干函数,再利用通道间信号的相干性来进行初始噪声抑制。其次,通过一个频域宽带最小方差无畸变响应(Minimum Variance Distortionless Response,MVDR)波束形成器保留目标声源方向的信号并抑制其他方向的信号干扰,再通过改进的维纳滤波器去除噪声残留提升语音质量。最后,使用梅尔频率倒谱系数(Mel Frequency Cepstral Coefficients,MFCC)和伽马通滤波器组频率倒谱系数(Gammatone Filter-bank Frequency Cepstral Coefficients,GFCC)对增强后的语音信号做特征参数提取并进行说话人识别。仿真过程采用声学人工头模拟双耳采集数据,实验结果表明,该语音增强算法在复杂噪声环境下能够获得较好的增强效果,能有效提升说话人识别系统的识别率。
Aiming at the problem of lowering recognition performance in noisy speech environment, a dual-mini microphone array speech enhancement algorithm is proposed for the front-end processing of recognition system. The speech enhancement algorithm based on Coherent Filter and MVDR-wiener is presented. First, the dual-mini microphone array signals are collected to derive the coherence function between adjacent channels and to carry out the initial noise suppression by using the coherence between elements. Then, the information of target speech is processed by the broad-band MVDR algorithm to keep the signal in the desired sound source direction and suppress the interference signals in other directions. The improved Wiener filter which can get better voice quality by removing residual noise is utilized to process the enhanced signal. Finally, a speaker recognition system using Mel frequency cepstral coefficients(MFCC) and GFCC for feature extraction is used to recognize the enhanced speech. Binaural data are acquired with acoustic artificial head in simulations, the experimental results show that the speech enhancement algorithm can obtain better enhanced effect in noisy environment and effectively improve the recognition rate.

引文

[1]LOIZOU P C.Speech enhancement:Theory and Practice[M].America:The Chemical Rubber Company Press,2013:75-109.
    [2]STEVEN F B.A Spectral Substraction Algorithm for Suppression of Acoustic Noise in Speech[J].IEEE International Conference on Acoustics Speech&Signal Processing,1979,27(2):200-203.
    [3]EPHRAIM Y,MALAH D.Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J].IEEE Transactions on Acoustics Speech and Signal Processing,1984,32(6):1109-1121
    [4]张鑫琪,冯海泓,徐海东.改进的最小均方误差语音增强算法的研究[J].声学技术,2008,27(2):230-234.ZHANG Xinqi,FENG Haihong,XU Haidong.A study of an improved minimum mean-square error speech enhancement algorithm[J].Technical Acoustics,2008,27(2):230-234.
    [5]李宁,蒋建中,郭军利.一种听觉掩蔽效应和维纳滤波的语音增强算法[J].计算机工程与应用,2011,47(29):161-163.LI Ning,JIANG jianzhong,GUO Junli.Speech enhancement algorithm based on auditory masking effect and Wiener filter[J].Computer Engineering and Applications,2011,47(29):161-163.
    [6]ALLEN J B,BERKLEY D A,BLAUERT J.Multimicrophone Signal-Processing technique to remove room reverberation from speech signals[J].J.Acoust.Soc.Am.,1977,62(4):912-915.
    [7]YOUSEFIAN N,LOIZOU P C.A dual-microphone speech enhancement algorithm based on the coherence function[J].IEEE Transactions on Audio Speech&Language Processing,2011,20(2):599-609.
    [8]GHOSH P K,TSIARTAS A,NARAYANAN S.Robust voice activity detection using long-term signal variability[J].IEEE Transactions on Audio Speech&Language Processing,2011,19(3):600-613.
    [9]马金龙,曾庆宁,胡丹,等.基于麦克风小阵的多噪声环境语音增强算法[J].计算机应用,2015,35(8):2341-2344.MA Jinlong,ZENG Qingning,HU Dan,et al.Speech enhancement algorithm based on microphone array under multiple noise envrionments[J].Journal of Computer Applications,2015,35(8):2341-2344.
    [10]王群,曾庆宁,郑展恒.低信噪比环境下的麦克风阵列语音识别算法研究[J].科学技术与工程,2017,17(31):101-107.WANG Qun,ZENG Qingning,ZHENG Zhanheng.Speech recognition based on microphone array in low SNR[J].Science Technolpgy and Engineering,2017,17(31):101-107.
    [11]GRIFFIITHS L J,JIM C W.An alternative approach to linearly constrained adaptive beamforming[J].IEEE Transactions on Antennas&Propagation,1982,30(1):27-34.
    [12]CAPON J,GREEENFIELD R J,KOLKER R J.Multidimensional maximum-likelihood processing of a large aperture seismic array[J].Proceedings of the IEEE,1967,55(2):192-211.
    [13]郑恩明,黎远松,陈新华,等.改进的最小方差无畸变响应波束形成方法[J].上海交通大学学报,2016,50(2):188-193.ZHENG Enming,LI Yuansong,CHEN Xinhua,et al.Improved bearing resolution approach for MVDR beam-forming[J].Journal of Shanghai Jiaotong University,2016,50(2):188-193.
    [14]马金龙,曾庆宁,龙超,等.多噪声环境下可懂度提升的助听器语音增强[J].计算机工程与设计,2016,37(8):2160-2164.MA Jinlong,ZENG Qingning,LONG Chao,et al.Intelligibility improved speech enhancement for hearing aids in complex noise envrionment[J].Computer Engineering and Design,2016,37(8):2160-2164.
    [15]SCALART P,FILHO J V.Speech enhancement based on a prior signal to noise estimation[C]//IEEE International Conference on Acoustics,1996,629-632.
    [16]郭利华,马建芬.具有高可懂度的改进的维纳滤波的语音增强算法[J].计算机应用与软件,2014,31(11):155-157.GUO Lihua,MA Jianfen.An improved wiener filtering speech enhancement algorithm with high intelligibility[J].Computer Applications and Software,2014,31(11):155-157.
    [17]蒋晔,唐振民.GMM文本无关的说话人识别系统研究[J].计算机工程与应用,2010,46(11):179-182.JIANG Ye,TANG Zhenmin.Research on GMM text-independent speaker recognition[J].Computer Engineering and Applications,2010,46(11):179-182.
    [18]程小伟,王健,曾庆宁,等.基于调制域谱减法的鲁棒性说话人识别[J].科学技术与工程,2017,17(3):252-257.CHENG Xiaowei,WANG Jian,ZENG Qingning,et al.Robust speaker recognition based on modulation domain spectral subtraction[J].Science Technology and Engineering,2017,17(3):252-257.
    [19]余建潮,张瑞林.基于MFCC和LPCC的说话人识别[J].计算机工程与设计,2009,30(5):1189-1191.YU Jianchao,ZHANG Ruilin.Speaker recognition method using MFCC and LPCC features[J].Computer Engineering and Design,2009,30(5):1189-1191.
    [20]王玥,钱志鸿,王雪,等.基于伽马通滤波器组的听觉特征提取算法研究[J].电子学报,2010,38(3):525-528WANG Yue,QIAN Zhihong,WANG Xue,et al.An auditory feature extraction algorithm based on gammatone filter-banks[J].Acta Electronica Sinica,2010,38(3):525-528.
    [21]林琳,陈虹,陈建.基于鲁棒听觉特征的说话人识别[J].电子学报,2013,41(3):619-624.LIN Lin,CHEN Hong,CHEN Jian.Speaker recognition based on robust auditory feature[J].Acta Electronica Sinica,2013,41(3):619-624.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700