噪声环境下基于特征的语音端点检测研究

英文题名：The Research of Voice Activity Detection Based on Characters in Noise Environment
作者：赵丽霞
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：语音端点检测 ; 特征 ; 熵 ; 支持向量机
英文关键词：VAD ; Feature ; Entropy ; SVM
学位年度：2010
导师：赵欢
学科代码：081203
学位授予单位：湖南大学
论文提交日期：2010-04-26
答辩委员会主席：邝继顺

摘要

语音端点检测的目的是从包含语音的一段信号中确定出语音的起点和终点,是语音信号处理的前端操作,在语音增强、语音编码、语音识别等领域得到广泛应用。语音端点检测方法有基于特征和基于模型两类,基于模型的方法比较复杂,对环境的适应能力差,而基于特征的方法相对简单且具有一定的抗噪能力,此方法要求找到某种能够区分语音和噪声的鲁棒性特征。本文针对基于特征的语音端点检测方法展开研究。
     针对基于谱熵的检测算法在低信噪比下鲁棒性差的缺点,提出一种新的基于距离熵的检测算法。该算法利用熵和倒谱系数的鲁棒性改变概率密度的计算方法,对经过预处理的带噪信号进行一系列运算得到每一点的倒谱系数,根据倒谱系数获得欧式距离,由欧式距离构造概率密度函数,由概率密度函数得到距离熵特征,最后利用距离熵采用双门限值进行语音和噪声的区分。
     本文还提出了一种基于支持向量机的多特征检测算法。基于支持向量机的检测算法对带噪信号分别求信噪比、修正过零率和AMMM三个特征,将三个特征组成一个特征矩阵,使用部分带噪信号对支持向量机进行训练,利用训练后的支持向量机自动区分语音和噪声。
     本文实验所使用的带噪信号由法国aurora2.0库的干净语音和Noisex92噪声库的噪声混合而成,并使用MATLAB工具进行仿真实验,实验结果表明,本文提出的两种端点检测算法具有一定的鲁棒性,在较低信噪比下仍能较好的区分语音和噪声。
The purpose of voice activity detection (VAD) is detecting the beginning and ending points of speech from a signal which contains speech. As a pre-operation of speech signal processing, VAD is very important and has potential applications in the areas of speech enhancement, coding, identification and so on. VAD methods could be divided into two categories:feature-based and model-based. Model-based VAD method is complex and has poor adaptation to environment; Feature-based VAD method which requires finding some robust features to distinguish between voice and noise is relatively simple and has some anti-noise capability. This paper focuses on researching feature-based VAD method.
     Because of VAD method based on entropy can not work well in noisy environment, we propose a new algorithm based on distance entropy. This algorithm makes use of robustness of cepstral and entropy, change the calculate way of probability density function. We obtain cepstral coefficients of each speech point by a series of operations on noisy signal which have been pre-processed. We can get Euclidean distance according to cepstral coefficients, and then, we generate probability density function by Euclidean distance and construct distance entropy by way of probability density function. Finally, we can find useful parts of noisy speech by distance entropy.
     In addition, we propose another improved algorithm called VAD algorithm based on support vector machine by multi-feature. This algorithm extracts SNR, amending zero crossing rate and AMMM three characteristics from noisy signal, the three characteristics format a characteristic matrix. We employ parts of the noisy signal to train the support vector machine, set parameters, then support vector machine can distinguish noise from speech automatically.
     Noisy signals used in experiments are mixed by clean speech and noise. Clean speech come from French aurora2.0 Library, and noise come from Noisex92 noise library. Experiments tool is MATLAB. Simulation results prove that the two algorithms proposed in this paper perform well on anti-noise, they can work well in high-noisy environment.

引文

[1]赵力.语音信号处理.第一版.北京：机械工业出版社,2003,1-4
    [2]冯硕.自适应的语音端点检测技术研究：[北京邮电大学硕士学位论文].北京：北京邮电大学,2008,5-7
    [3]Lamel L, Rabiner L, Rosenberg A, et al. An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics Speech and Signal Processing,1981,29(4):777-785
    [4]孙海英.基于倒谱特征和浊音特性的语音端点检测方法的研究：[青岛科技大学硕士学位论文].山东：青岛科技大学,2008,5-9
    [5]Qi Li, Jinsong Zheng, Tsai A, et al. Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition. IEEE Transactions on Speech and Audio Processing,2002,10(3):146-152
    [6]张振红.基于分形维数的语音端点检测算法研究：[太原理工大学硕士学位论文].山西：太原理工大学,2008,7-10
    [7]刘华平,李昕,徐柏龄等.语音信号端点检测方法综述及展望.计算机应用研究,2008,25(8)：2278-2283
    [8]Hyeopwoo Lee, Dongsuk Yook. Space-time voice activity detection. IEEE Transactions on Consumer Electronics,2009,55(3):1471-1476
    [9]Jianjun Lei, Jiachen Yang, Jian Wang, et al. A Robust Voice Activity Detection Algorithm in Nonstationary Noise. In:International Conference on Industrial and Information Systems, Peradeniya,2009,195-198
    [10]Rabiner L R, Sainaur M R. Voiced unvoiced silence detection using the Itakura LPC distance measure. Proce of ICASSP,1977,323-326
    [11]Lu Lie, Jiang Hao, Zhang Hong jiang. A robust audio classification and segmentation method. In:Proc of the 9th ACM Internation Conference on Multimedia, Canada,2001,203-211
    [12]Ramirez J, Segura J C, Benitez C, et al. An effective subband OSF-based VAD with noise reduction for robust speech recognition. IEEE Transactions on Speech and Audio Processing,2005,13(6):1119-1129
    [13]Li Ye, Wang Tong, Cui Hui juan, et al. Voice Activity Detection in Non-stationary Noise. IMACS Multiconference on Computational Engineering in Systems Applications,2006,2(7):1573-1575
    [14]Dong Kook Kim, Keun Won Jang, Joon Hyuk Chang. A New Statistical Voice Activity Detection Based on UMP Test. Signal Processing Letters IEEE,2007, 14(11):891-89
    [15]Farsinejad M, Analoui M. A New Robust Voice Activity Detection method based on Genetic Algorithm. In:Telecommunication Networks and Applications Conference. Adelaide,2008,80-84
    [16]Xueying Zhang, Zhefeng Zhao, Gaofeng Zhao. A Speech Endpoint Detection Method Based on Wavelet Coefficient Variance and Sub-Band Amplitude Variance. In:First International Conference on Innovative Computing Information and Control. Beijing,2006,83-86
    [17]Joon Hyuk Chang, Nam Soo Kim, Mitra S K. Voice Activity Detection Based on Multiple Statistical Models. IEEE Transactions on Signal Processing,2006,54(6): 1965-1976
    [18]韩纪庆,张磊,郑铁然.语音信号处理.第一版.北京：清华大学出版社,2004,32-34
    [19]Shi Huang Chen, Hsin Te Wu, Chia Hsiang Chen, et al. Robust voice activity detection algorithm based on the perceptual wavelet packet transform. In: Proceedings of 2005 International Symposium on Intelligent Signal Processing and Communication Systems. Hong Kong,2005,45-48
    [20]Jong Won Shin, Hyuk Jin Kwon, Suk Ho Jin, et al. Voice Activity Detection Based on Conditional MAP Criterion. Signal Processing Letters IEEE,2008, 15(2):957-960
    [21]Li,Ye, Wang Tong, Cui Huijuan, et al. Voice Activity Detection in Non-stationary Noise. IMACS Multiconference on Computational Engineering in Systems Applications, Beijing,2006,1573-1575
    [22]Qi Li, Jinsong Zheng, Qiru Zhou, et al. A robust real-time endpoint detector with energy normalization for ASR in adverse environments. In:2001 IEEE International Conference on Acoustics Speech and Signal Processing. Salt Lake City,2001,233-236
    [23]Qi Li, Jinsong Zheng, Tsai A, et al. Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing,2002,10(3):146-157
    [24]Javier Hermado, Climent Nadeu. Linear prediction of one-sided autocorrelation sequence for noise speech recognition. IEEE Transaction on Speech and Audio Processing,1997,5(1):80-84
    [25]朴春俊,马静霞,徐鹏.带噪语音端点检测方法研究.计算机应用,2006,26(11)：2685-2690
    [26]果永振,何遵文.一种多特征语音端点检测算法及其实现.通信技术,2003,1,8-11
    [27]Soleimani S A, Ahadi S M. Voice Activity Detection based on Combination of Multiple Features using Linear/Kernel Discriminant Analyses. In:International Conference on Information and Communication Technologies. Damascus,2008, 1-5
    [28]Chen Guanghua, Liu Junhai, Ye Jun. An Improved Method of Endpoints Detection Based on Energy-Frequency-Value. In:Conference on High Density Microsystem Design and Packaging and Component Failure Analysis. Shanghai,2006,9-11
    [29]Tanyer S G, Ozer H. Voice activity detection in nonstationary noise. IEEE Transactions on Speech and Audio Processing,2000,8(4):478-482
    [30]LIM J S, OPPENHEIM A V. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE,2001,67(12):1586-1604
    [31]Junqua J C, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise. IEEE Transactions on speech and Audio Processing,1994, 2(3):406-412
    [32]BOLL S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing,2003,27(2):113-120
    [33]柴君,赵振东,戚银城等.基于谱减法的语音端点检测算法.华北电力大学学报,2006,33(3)：63-65
    [34]徐大为,吴边,赵建伟等.一种噪声环境下的实时语音端点检测算法.计算机工程与应用,2003,39(1)：115-117
    [35]Fujimoto M, Ishizuka K, Nakatani T. A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme. In: IEEE International Conference on Acoustics Speech and Signal Processing. Las Vegas,2008,4441-4444
    [36]Won Ho Shin, Byoung Soo Lee, Yun keun Lee, et al. Speech/Non-speech Classification Using Multiple Features For Robust Endpoint Detection. In:2000 IEEE International Conference on Acoustics Speech and Signal Processing. Istanbul,2000,1399-1402
    [37]Waheed K, Weaver K, Salam F M. A Robust Algorithm for Detecting Speech Segments Using an Entropic Contrast. In:The 2002 45th Midwest Symposium on Circuits and Systems. Tulsa,2002,328-331
    [38]Wu B F, Wang K C. Robust Endpoint Detection Algorithm Based on the Adaptive Band Partitioning Spectral Entropy in Adverse Environments. IEEE Transactions on Speech and Audio Processing,2005,13(5):762-775
    [39]Asgari M, Sayadian A, Farhadloo M. Voice Activity Detection Using Entropy in Spectrum Domain. In:Telecommunication Networks and Applications Conference. Adelaide,2008,407-410
    [40]Kun Ching Wang, Yi Hsing Tasi. Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy. In:Second International Symposium on Universal Communication. Osaka,2008,423-438
    [41]Renevey P, Drygajlo A. Entropy Based Voice Activity Detection in Very Noisy Conditions. In:Proceedings of 7th European Conference on Speech Communication and Technology. Denmark,2001,1887-1890
    [42]雷雄国,曾以成,李凌.基于近似熵的语音端点检测.声学技术,2007,26(1)：121-125
    [43]Shi Huang Chen, Shih Hao Chen, Bao Rong Chang. A Support Vector Machine Based Voice Activity Detection Algorithm for AMR-WB Speech Codec System. In:Second International Conference on Innovative Computing Information and Control. Kumamoto,2007,64-66
    [44]董恩清,赵鹤鸣,周亚同等.支持向量机在语音激活检测中的应用研究.通信学报,2003,24(3)：70-77
    [45]蔡魁杰.基于支持向量机的汉语语音端点检测和声韵分离：[哈尔滨工程大学硕士学位论文].哈尔滨：哈尔滨工程大学,2007,10-15
    [46]Baig M, Masud S, Awais M. Support Vector Machine based Voice Activity Detection. In:International Symposium on Intelligent Signal Processing and Communications. Yonago,2006,319-322
    [47]Cheng Hsiung Hsieh, Ting Yu Feng, Ren Hsien Huang. Voice Activity Detection Based on GM(1,1) Model. In:6th IEEE/ACIS International Conference on Computer and Information Science. Melbourne,2007,1093-1098
    [48]Yi Lin, Sifeng Liu. A Historical Introduction to Grey Systems Theory. In:2004 IEEE International Conference on Systems Man and Cybernetics. Netherlands, 2004,2403-2408
    [49]Cheng Hsiung Hsieh, Ting Yu Feng, Po Chin Huang. Energy-based VAD with grey magnitude spectral subtraction. Speech Communication,2009,51(9): 810-819

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700