基于分形维数的语音端点检测算法研究

英文题名：The Endpoint Detection Algorithm of Speech Based on Fractal Dimension
作者：张振红
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：端点检测 ; 分形维数 ; 模糊RBF神经网络 ; 系数方差
英文关键词：endpoint detection ; fractal dimension ; fuzzy RBF neural network ; parameter variance
学位年度：2008
导师：张雪英
学科代码：081001
学位授予单位：太原理工大学
论文提交日期：2008-05-01

摘要

语音信号的端点检测技术就是从包含语音的一段信号中准确地确定语音的起始点和终止点,区分语音和非语音信号。有效的端点检测技术不仅能在语音识别系统中减少数据的采集量,节约处理时间,还能排除无声段或噪声段的干扰,提高语音识别系统的性能,而且在语音编码中还能降低噪声段和静音段的比特率,提高编码效率。因此,端点检测是语音处理技术中的一个重要方面。
在低信噪比的环境中进行精确的端点检测比较困难,尤其是在无声段或者发音前后。本文首先总结了现有典型的语音端点检测算法,包括:基于短时能量及过零率的语音端点检测算法、基于LPC倒谱特征的语音端点检测算法、基于熵函数的语音端点检测算法、基于隐马尔可夫模型(HMM)的语音端点检测算法和基于子带平均能量方差的语音端点检测算法。分析了各种端点检测算法所选用的特征,并给出了部分算法的仿真结果。这些方法在静音环境下或当噪声较小时可以取得较好的检测结果,但在语音环境较恶劣、信噪比较低时,检测的结果下降较快,难以让人满意。随后在前人工作的基础上提出了噪声环境下三种语音端点检测新算法。算法一:提出了基于分形维数的语音端点检测方法。该方法利用了分形维数在噪声情况下作为语音端点检测参数的优越性,克服了在噪声情况下判决门限难以估计的问题。算法二:提出了基于分形维数和模糊RBF神经网络的语音端点检测方法。该方法结合了分形维数在噪声情况下作为语音端点检测参数的优越性,以及基于信息熵和神经网络的语音端点检测方法避免设置阈值的优点。仿真结果表明该方法对低信噪比信号,端点检测的准确率有一定的提高。算法三:提出了基于1/f分形信号小波模型和模糊RBF神经网络的语音端点检测方法。仿真结果表明该方法在常见的噪声环境下效果较好,算法实现简单,环境适应性较强。
The endpoint detection technology of speech signal is to accurately determine starting point and ending point from a section of speech signal. Thus it can distinguish speech and non-speech signal. Effective endpoint detection can not only reduce the amount of data collection and save the processing time, but also can eliminate interference from the silent and the noise. It can improve property of speech recognition system. Besides it can reduce bit rate of the noise and the silent in speech coding so improve the coding efficiency. Therefore endpoint detection is very important in speech processing.
It is a bit difficulty to detect endpoint accurately in low SNR, especially in silent segment and pre-and post pronunciation. This paper summarized the typical endpoint detection algorithm, including the algorithm based on short-time energy and zero-crossing rate, the algorithm based on LPC cepstrum, the algorithm based on entropy function, the algorithm based on HMM and the algorithm based on sub-band average energy variance. The paper analyzed the different feature and presented the part of the simulation results. Those algorithms can have a good performance when it is quiet or has a small noise. But the result has a rapid decline when the environment is bad and SNR is low. The paper proposed three methods of endpoint detecting in noise environment. The first is the endpoint detection based on fractal dimension. It utilizes fractal dimension superiority and overcomes the difficulty of decision threshold in noise environment. The second is the endpoint detection based on fractal dimension and fuzzy RBF neural network. This method combines the advantages of both fractal dimension and information entropy and neural network which avoid threshold setting. The simulation result shows that this method is better in accuracy of endpoint detection in low SNR. The third one is endpoint detection based on 1/f fractal signal wavelet model and fuzzy RBF neural network. The experiment shows that it has a better effect in normal noise environment. The algorithm is easy and adaptable to environment.

引文

[1]张刚,张雪英,马建芬,语音处理与编码,北京,兵器工业出版社,2000
    [2]易克初,田斌,付强,语音信号处理,北京,国防工业出版社,2000
    [3]赵力,语音信号处理,北京,机械工业出版社,2003
    [4]Karray L,Martin A,Towards Improving·Speech Detection Robustness for Speech Recognition in Adverse Conditions[J],Speech Communication,2003,vol.40,261-276
    [5]李晋,语音信号端点检测算法研究,湖南师范大学硕士学位论文,2006
    [6]J.Ramirez,J.M.Gorriz,J.C.Segura,Statistical Voice Activity Detection Based on Integrated Bispectrum Likelihood Ratio Tests,Journal of the Acoustical Society of America,2007,121(5),2946-2958,
    [7]Dongwen Ying,Yu Shi,Frank K Soong,Jianwu Dang,Xugang Lu,A Robust Voice Activity Detection Based on Noise Eigenspace Projection,ISCSLP 2006(Springer LNAI Book,SCI Indexed),Kent Ridge,Singapor,Dec.2006,76-86
    [8]张震宇,基于Matlab的语音端点检测实验研究,浙江科技学院学报,Sep.2007,19(3),197-201
    [9]JW Shin,JH Chang,NS Kim,Voice activity detection based on a family of parametric distributions,Pattern Recognition Letters,Aug 1,2007,28(11),1295-1299
    [10]W.Shin,B.Lee,Y Lee,J.Lee.Speech/Non-Speech Classification Using Multiple Features for Robust Endpoint Detection.International Conference on Acoustics,Speechand Signal Processing,Istanbul,Turkey,2000:1399-1402
    [11]乔峰,张雪英,一种基于信息熵和神经网络的语音端点检测方法,太原理工大学学报,2007,38(专辑),145-147
    [12]白文雅,黄健群,小波变换方法实现语音端点检测,科学技术与工程,Apr.2007,7(7),1333-1336
    [13]胡光锐,韦晓东,基于倒谱特征的带噪语音端点检测,电子学报,Oct.2007,28(10),95-97
    [14]朱杰,韦晓东,噪声环境中基于HMM模型的语音信号端点检测方法,上海交通大学学报,Oct.1998,22(10),14-16
    [15]Aini Hussain,Salina Abdul Samad,Liew Ban Fah,Endpoint Detection of Speech signal Using Neural Network[R],TENCON 2000,Proceedings,Malaysia,2000,271-274
    [16]杨胜跃,周宴宇,黄深喜,语音信号端点检测方法与展望,信息技术,2005,№.7,5-8
    [17]杨崇林,李雪耀,孙羽,强噪声背景下汉语语音端点检测和音节分割[J],哈尔滨工程大学学报,1997,18(5),28-32
    [18]吴一帆,基于模糊神经网络模式分类的数据挖掘算法,长沙交通学院学报,Mar.2004,20(1),45-49
    [19]Rabiner L R,Fundamentals of Speech Recognition,北京,清华大学出版社,1999
    [20]陈四根,基于熵函数的语音端点检测方法,声学与电子工程,2001,№.1,28-30
    [21]萧宝瑾,信息论和编码,北京,兵器工业出版社,2000,14-17
    [22]J L Shen,J W Hung,L S Lee,Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments[C],Proceedings of ICSLP-98,1998,232-238
    [23]B.H.Juang,L.R.Rabiner,Hidden Markov Models for Speech Recognition Technometrics,Aug.1991,33(3),251-272
    [24]赵高峰,基于小波分析的语音端点检测算法研究,太原理工大学硕士学位论文,2006
    [25]飞思科技产品研发中心,小波分析理论与MATLAB7实现,北京,电子工业出版社,2005
    [26]费珍福,王树勋,何凯,分形理论在语音信号端点检测及增强中的应用,吉林大学学报,2005,23(1),139-142,
    [27]XUEQIN CHEN,HEMING ZHAO,Fractal Characteristic-Based Endpoint Detectionfor Whispered Speech,Proceedings of the 6~(th)WSEAS International Conference on Signal,Speech and Image Processing,Lisbon,Portugal,September 22-24,2006,193-196
    [28]贾丽会,张修如,分形理论及在信号处理中的应用,计算机技术与发展,Sep.2007,17(9),203-209
    [29]亢宽盈,分形理论的创立、发展及其科学方法论意义,科学管理研究,Dec.1998,16(6),53-56
    [30]董远,胡光锐,分形理论及其应用,数据采集与处理,Sep.1997,12(3),187-191
    [31]陈绍英,王启文,分形理论及其应用,呼伦贝尔学院学报,Jun.2005,13(2),59-63
    [32]王耀南,智能信息处理技术,北京,高等教育出版社,2003
    [33]姜建东,陈进,屈梁生,自仿射信号分形维数估计算法的改进,信号处理,Mar.199915(1),54-67
    [34]陈建安,分形维数的定义及测定方法,电子科技,1999,№.1,44-46
    [35]韦岗,陆以勤,欧阳景正,混沌、分形理论与语音信号处理,电子学报,Jan.1996,24(1),34-39
    [36]陈国,胡修林,张蕴玉,朱耀庭,基于短时分形维数的汉语语音自动分段技术研究,通信学报,Oct.2000,21(10),6-13
    [37]韦岗,衷宇清,欧阳景正,基于分形迭代函数系统的语音合成新算法,电路与系统学报,Mar.1996,1(1),75-82
    [38]陈亮,张雄伟,基于分形维数实现语音分割和增强,北京邮电大学学报,Jun.2003,26(增刊),112-115
    [39]陈亮,张雄伟,语音信号非线性特征的研究,解放军理工大学学报,Apr.2000,1(2),11-17
    [40]李国勇,智能控制及其MATLAB,北京,电子工业出版社,2005年
    [41]丛爽,神经网络、模糊系统及其在运动控制中的应用,合肥,中国科学技术大学出版社,2001年
    [42]张大庆,赖涤泉,基于模糊RBF神经网络的柴油机电液调速控制方法,石家庄铁道学院学报(自然科学版),2007,20(3),41-45
    [43]WANG Fan,ZHENG Fang,Speech detection in non-stationary noise based on 1/f process[J],Journal of Computer & Technology,2002,17(1),327-330
    [44]郑治真,沈萍,杨选辉,万玉莉,小波变化及其MATLAB工具的应用,北京,地震出版社,2001.10
    [45]周伟,MATLAB小波分析高级技术,西安,西安电子科技大学出版社,2006
    [46]李硕,李冰洋,王蜜,小波变换及其在语音信号处理中的应用,哈尔滨师范大学自然科学学报,May.2006,22(4),21-24
    [47]苟大举,苟平,周群彪,语音DCT变换的一种小波编码方法,四川大学学报(自然科学版),Dec.2004,41(6),1153-1157
    [48]S.Wada,N.Ito,Improved Wavelet Based Estimations of Nearly 1/f Processes,Department of Electrical Engineering,Tokyo Denki University
    [49]何凯,王树勋,戴逸松,1/f类分形信号的小波去噪方法,吉林大学学报(工学版),Jun.2003,33(1),77-81
    [50]WANG Fan,ZHENG Fang,WU Wenhu,A SELF-ADAPTING ENDPOINT DETECTIONALGORITHM FOR SPEECH RECOGNITION INNOISYENVI RONMENTS BASED ON 1/F PROCESS,International Symposium on Chinese Spoken Language Processing,Beijing,Oct.13-15,2000,327-330

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700