基于谱熵梅尔积的语音端点检测方法

英文篇名：Voice Activity Detection Method Based on MFPH
作者：吴新忠 ; 夏令祥 ; 张旭 ; 周成
英文作者：WU Xin-zhong;XIA Ling-xiang;ZHANG Xu;ZHOU Cheng;School of Information and Control Engineering,China University of Mining and Technology;
关键词：语音端点检测 ; 梅尔频率倒谱系数 ; 谱熵 ; 谱熵梅尔积 ; 双门限法 ; 低信噪比
英文关键词：voice activity detection;;Mel frequency spectrum coefficient;;spectral entropy;;spectral entropy Mel product;;double-threshold method;;low signal-to-noise ratio
中文刊名：BJYD
英文刊名：Journal of Beijing University of Posts and Telecommunications
机构：中国矿业大学信息与控制工程学院;
出版日期：2019-05-08 10:50
出版单位：北京邮电大学学报
年：2019
期：v.42
基金：“十三五”国家重点研发计划项目(2016YFC0801800);; 江苏省重点研发计划项目(BE2016046)
语种：中文;
页：BJYD201902014
页数：7
CN：02
ISSN：11-3570/TN
分类号：87-93

摘要

为了克服传统语音端点检测算法在低信噪比环境下准确率低的问题,提出一种基于谱熵梅尔积(MFPH)的语音端点检测算法.首先,提取带噪语音信号的梅尔频率倒谱系数中的第一维参数MFCC0,将其与谱熵的乘积作为最终区分语音段和背景噪声段的融合特征参数;然后,结合模糊C均值聚类算法和贝叶斯信息准则(BIC)算法对MFPH特征参数门限值进行自适应估计;最后,采用双门限法进行语音端点检测.实验结果证明,与传统方法比较,该方法在-5～15 d B低信噪比环境下的语音端点检测准确率有较大提高.
In order to solve the problem that the accuracy of traditional voice activity detection algorithms is low in the low signal-to-noise ratio( SNR) environment,a voice activity detection algorithm based on product of spectral entropy and Mel( MFPH) was proposed. Firstly,the first dimensional parameter MFCC0 of Mel frequency spectrum coefficient of the speech signal with noisy was extracted,and the product of MFCC0 and spectral entropy was taken as fusion characteristic parameter of finally distinguishing speech segment from background noise. Then,the threshold value of MFPH characteristic parameters was estimated adaptively based on combination of fuzzy C-means clustering algorithm( FCM) and Bayesian information criterion( BIC). Finally,the double-threshold method was adopted for the voice activity detection. Experiments show that the accuracy of the proposed method is greatly improved in the-5 ～15 dB low SNR environment compared with traditional methods.

引文

[1]赵力.语音信号处理[M].北京:机械工业出版社,2016:116-117.
    [2]Cao D,Gao X,Gao L.An improved endpoint detection algorithm based on MFCC cosine value[J].Wireless Personal Communications,2017,95(3):2073-2090.
    [3]Suh Y,Kim H.Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection[J].IEEE Signal Processing Letters,2012,19(8):507-510.
    [4]张毅,倪雷.基于模糊熵与改进相关向量机的语音端点检测[J].华中科技大学学报(自然科学版),2017,45(8):15-19.Zhang Yi,Ni Lei.Speech activity detection based on fuzzy entropy and improved relevance velevance vector machine[J].Journal of Huazhong University of Science and Technology(Natural Science Edition),2017,45(8):15-19.
    [5]Kim S K.Voice activity detection algorithm using radial basis function network[J].Electronics Letters,2004,40(22):1454-1455.
    [6]张晓雷,吴及,吕萍.基于支持向量机与多观测复合特征矢量的语音端点检测[J].清华大学学报(自然科学版),2011,51(9):1209-1214.Zhang Xiaolei,Wu Ji,LüPing.Support vector machine based VAD using the multiple observation compound feature[J].Journal of Tsinghua University(Science and Technology),2011,51(9):1209-1214.
    [7]胡波,肖熙.检测语音端点及基音的概率模型及方法[J].清华大学学报(自然科学版),2013,53(6):749-752.Hu Bo,Xiao Xi.Endpoint detection and pitch determination method based on a probability model[J].Journal of Tsinghua University(Science and Technology),2013,53(6):749-752.
    [8]Davis S B,Mermelstein P.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J].Readings in Speech Recognition,1980,28(4):65-74.
    [9]Huang L S,Yang C H.A novel approach to robust speech endpoint detection in car environments[C]∥IEEE International Conference on Acoustics,Speech,and Signal Processing,2000.[S.l.]:IEEE,2000:1751-1754.
    [10]Mcclellan S,Gibson J D.Variable-rate CELP based on subband flatness[J].Speech and Audio Processing IEEE Transactions on,1995,5(2):120-130.
    [11]Jin L,Cheng J.An improved speech endpoint detection based on spectral subtraction and adaptive sub-band spectral entropy[C]∥International Conference on Intelligent Computation Technology and Automation.[S.l.]:IEEE Computer Society,2010:591-594.
    [12]Tian Y,Wu J,Wang Z,et al.Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection[C]∥IEEE International Conference on Acoustics,Speech,and Signal Processing,2003.[S.l.]:IEEE,2003:I-444-I-447.
    [13]Cobos C,Mendoza M,Manic M,et al.Clustering of web search results based on an iterative fuzzy C-means algorithm and Bayesian information criterion[J].Information Sciences,2014,281(2):248-264.
    [14]Volinsky C T,Raftery A E.Bayesian information criterion for censored survival models[J].Biometrics,2015,56(1):256-262.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700