基于神经网络与HMM的说话人识别研究

作者：梁慧
论文级别：硕士
学科专业名称：检测技术与自动化装置
中文关键词：说话人识别 ; 隐马尔可夫模型 ; 小波分析 ; 自组织神经网络
英文关键词：speaker recognition ; Hidden Markov Model ; wavelet ; neural
英文关键词：network
学位年度：2012
导师：曾水平
学科代码：081102
学位授予单位：北方工业大学
论文提交日期：2012-05-31

摘要

说话人识别的目的是为了识别不同人的身份,识别过程是先选取一定的声音特征,然后运用一定的模型算法对每个说话人建立独有的模板库后进行逐-模板匹配,最终得到最佳匹配结果。在说话人识别领域,广泛采用的各种特征参数各有优缺点,其识别效果并不十分理想,长期以来,一直没有找到能够完全表征说话人个性差异的特征参数。本文讨论了几种常用特征参数和模型算法,并引入一种新的小波特征参数以及神经网络的改进算法组成一个说话人识别系统。
     本文介绍了特征参数Me[倒谱系数(MFCC),这个特征参数是基于倒谱域的参数,然而在描述说话人个性特征方面,参数的区分识别能力有些欠缺,故本文利用倒谱原理以及小波变换提取出了一种小波MFCC特征；另外,在模型算法方面,论文介绍了隐马尔科夫模型的初值算法,对当前普遍利用K阶均值聚类算法设定初值的方法进行了分析,同时引入自组织神经网络的聚类算法,使其与K阶均值聚类算法在训练过程的收敛速度方面进行分析和比较。
     本文的设计实验表明,采用小波MFCC特征大大减少了计算个数,其得到的系统识别率达到了94.4%,比采用MFCC特征得到的87.5%的识别率提高了7%左右；同时,在利用自组织特征映射神经网络与自组织竞争型神经网络对K阶均值聚类算法的改进实验中,把在实验中记录的不同说话人的不同特征参数以及经过不同的模型算法得到的训练迭代次数以及识别率作为分析的依据,得到了不同算法的优缺点以及存在的问题。
Speaker recognition aims to identify different identity, the recognition process is to select a certain sound features firstly, and then use some of the model algorithm for each speaker to establish unique template library for each template matching, and get the best matching results at last. In the field of speaker recognition, various characteristic parameters which are widely used have advantages and disadvantages, and the recognition results are not very satisfactory, since a long time, characteristic parameters which can be able to characterize the speakers individuality completely has not been found. This article discusses several common feature parameters and the algorithm of the model, and introducing a new wavelet feature parameters and the improved algorithm of neural network to make up a speaker recognition system.
     This paper introduces a kind of common characteristic parameter firstly, Mel cepstrum coefficient (MFCC), the parameter is based on cepstrum parameters, however, in describing the speaker personality characteristics, distinguishing ability of the parameter has some lack, therefore this article extracts a wavelet MFCC features using cepstrum principle and wavelet transform; moreover, in the algorithm of the model, the paper analyses initial importance of the hidden Markov model and describes the initialization method of the K order mean clustering algorithm which is used generally, while introducing self organizing neural network clustering algorithm, and make the comparison with the K order mean clustering algorithm in the process of training convergence aspects.
     Experimental results show that, using the wavelet MFCC features can greatiy reduce the number of its calculation, and the system recognition rate reached94.4%, compared with87.5%when used MFCC features, the recognition rate is improved by about7%; at the same time, in the experiment using self-organizing feature map neural network and self-organizing competitive neural network to improve K order mean clustering algorithm, based on the training iterations and the recognition rate obtained by different characteristic parameters of different speakers through different algorithms,we can analyse the advantages and disadvantages of different algorithms and their existing problems.

引文

[1]付中华.说话人识别系统鲁棒性研究：[博士学位论文].西安：西北工业大学,2004
    [2]刘潇.语音识别系统关键技术研究：[硕士学位论文].哈尔滨：哈尔滨工程大学,2006
    [3]吴朝晖,杨莹春.说话人识别模型与方法.北京：清华大学出版社,2009
    [4]赵力.语音信号处理.北京：机械工业出版社,2003
    [5]于瑞华.语音识别在公安工作中的应用.中国人民公安大学学报(自然科学版),2007,(4),96～99
    [6]陈方,高升.语音识别技术及发展.电信科学,1996,12(10),54～57
    [7]李春晓.基于语音识别的莫尔斯报文系统设计与实现：[硕士学位论文].哈尔滨：哈尔滨工程大学,2006
    [8]郑俊.噪声环境中说话人识别鲁林性研究：[硕士学位论文].中南大学,2007
    [9]杨璞.基于声门特征的说话人识别研究：[硕士学位论文].杭州：浙江大学,2005
    [10]朱淑琴.语音识别系统关键技术研究：[硕士学位论文].西安：西安电子科技大学,2004
    [11]黄文辉.基于矢量量化的说话人识别技术研究：[硕士学位论文].西安：西安电子科技大学,2006
    [12]王津.基于数学形态学的语音识别系统的研究与实现：[硕士学位论文].天津：河北工业大学,2005
    [13]沈波.说话人识别技术的研究：[硕士专业学位论文].苏州：苏州大学,2009
    [14]唐俊翟.基于说话人转换的语音识别的研究：[硕士学位论文].广州：华南理工大学,2006
    [15]苗浩,李晓东,田静.一种用于语音增强的频域盲信号分离算法.声学技术2007,26(3),431～434
    [16]Siow Yong Low, and Sven Nordholm. Convolutive Blind Source Separation With Post-Processing. IEEE Trans. Speech, Audio Processing,2004, 12(5):539-547.
    [17]S.Ikeda, N.Murata. A method of ICA in time-frequency domain, in Int.Conf. ICA and Signal Separation, Aussois, France.1999,1: 365-371.
    [18]F. Asano, S. Ikeda, M. Ogawa, H. Asoh, N. Kitawaki. Combined Approach of Array Processing and Independence Component Analysis for Blind Separation of Acoustic Signals. IEEE trans. Speech, Audio Processing,2003,11(5):204-215.
    [19]A. J. Bell, T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation,1995,7(6):1129-1159.
    [20]Hiroshi Sawada, Ryo Mukai. A Robust Approach To The Permutation Problem Of Frequency-Domain Blind Source Separation. IEEE trans. Speech, Audio Processing,2004,12(9):530-538.
    [21]Kostas Kokkinakis, Asoke K. Nandi. Multichannel Blind Deconvolution for Source Separation in Convolutive Mixtures of Speec. IEEE trans. Speech, Audio Processing,2006,14(1):200-212
    [22]梁晓辉,周权.语音信号处理方法的可靠性研究.电声技术,2010,34(4),58-62
    [23]金贺.基于VQ和HMM的说话人识别系统研究与实现：[硕士学位论文].内蒙古大学,2008
    [24]陈大为.基于HMM的说话人识别改进研究及应用：[硕士学位论文].杭州：浙江大学,2002
    [25]曾昭才.VQ和HMM在语音识别中的应用：[硕士学位论文].南京：东南大学,2006
    [26]赵姝彦.HMM和神经网络用于语音识别的算法研究：[硕士学位论文].太原：太原理工大学,2005
    [27]刘明宇.多重ANN/HMM混合模型在语音识别中的应用：[硕士学位论文].哈尔滨：哈尔滨工业大学,2008
    [28]王晓燕.基于ANN/HMM的时序模式识别方法研究：[硕士学位论文].哈尔滨：哈尔滨工业大学,2007
    [29]林琳.基于模糊聚类与遗传算法的说话人识别理论研究及应用：[博士学位论文].长春：吉林大学,2007
    [30]苏明武.基于DSP的语音识别技术研究及实现：[硕士学位论文].哈尔滨：哈尔滨工程大学,2005
    [31]李志平.短语音文本相关说话人识别系统的设计与实现：[硕士学位论文].成都：西南交通大学,2009
    [32]赵剑.说话人识别鲁棒性增强研究：[硕士学位论文].北京：北京邮电大学,2009
    [33]姚志强.说话人识别中提高GMM性能方法的研究：[博士学位论文].合肥：中国科学技术大学,2006
    [34]王玥.说话人识别中语音特征参数提取方法的研究：[博士学位论文].长春：吉林大学,2009
    [35]李凡,吴军,黄刚.基于BPNN/HMM神经网络的声学模型研究.华中科技大学学报(自然科学版),2004,32(9),9-11
    [36]忻栋,杨莹春,吴朝晖.基于SVM-HMM混合模型的说话人确认.计算机辅助设计与图形学学报,2002,14(11),1080～1082
    [37]许百林.基于矢量量化(VQ)和混合高斯模型(GMM)的说话人识别的研究：[硕士学位论文].南京：东南大学,2005
    [38]葛哲学等.小波分析理论与MATLABR2007实现.北京：电子工业出版社,2007
    [39]李燕萍,唐振民等.一种适于说话人辨认的自适应频率尺度变换.南京理工大学学报,2010,34(2),182～186
    [40]郑元谨,李乐民等.基于小波变换的自适应多分辨率语音增强算法.电子科学学刊，1998,20(3),289～295
    [41]吴亮春等.改进的基于小波包变换的语音特征提取算法.计算机工程与应用,2011,47(5),210～212
    [42]HayakawaS, ItakuraF. Text-dependent speaker recognition using the information in the higher frequency band. Proceedings of Conference on A coustic. Speech and Signal Processing. Adelaide,SA Australia, IEEE,1994:19-22
    [43]勾轶,刘晓丽,陈长征.基十小波与神经网络的说话人身份识别.沈阳工业大学学报,2005,27(1),87～90
    [44]徐惠红.优化的HMM算法在文本相关的说话人识别中的研究.微型机与应用,2010,(2),69～70
    [45]杨彦.基于矢量量化(VQ)和人工神经网络(ANN)的说话人识别的研究：[硕士学位论文].南京：东南大学,2006
    [46]王吉林.利用矢量量化(VQ)和混合高斯模型(GMM)的说话人识别的研究：[硕士位论文].南京：东南大学,2004

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700