与文本无关的说话人识别的关键技术研究

英文题名：Research on Text-Independent Speaker Recognition
作者：杨延龙
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：端点检测 ; 语音增强 ; 特征提取 ; GMM ; 说话人识别
英文关键词：Endpoint detection ; Speech Enhancement ; Feature Extraction ; GMM ; Speaker Recognition
学位年度：2010
导师：高西全
学科代码：081001
学位授予单位：西安电子科技大学
论文提交日期：2010-01-01

摘要

本文主要对与文本无关的说话人识别的基本原理和相关算法进行了深入的分析和研究。在端点检测中,对几种常用的检测算法进行了分析和研究,针对单个参数用于端点检测所存在的问题,本文将短时能量和Renyi熵的结合能量-Renyi熵(ERE)参数用于端点检测,取得了较好的效果。在语音增强中,重点分析和研究了β?order自适应谱减法。在此基础上,本文采用谱增益迭代的方式来进一步逼近真实语音谱,并将该算法应用到近几年发展起来的CMSBS参数提取过程中的输出子带能量增强,取得了很好的效果。在识别模型方面,在对GMM基本原理和相关算法分析的基础上,重点研究了GMM参数估计算法-SGML算法。该算法通过自分裂的方式寻找最佳的模型混合度,而且能够在分裂过程中不断提高参数估计精度,很好地解决了使用EM算法估计参数时面临的模型混合度较难选取和对初始值较为敏感的问题。通过对上述算法的深入分析,在VC平台上对其性能进行了验证。
This paper mainly focuses on the study of the basic principles and related algorithms of text-independent speaker recognition. In the endpoint detection, the research is focused on the endpoint detection algorithm. With the shortages of a single parameter which is used for endpoint detection, the combination of energy and Renyi entropy: energy- Renyi entropy parameter is used for endpoint detection. In the speech enhancement, the research is mainly focused on the adaptiveβ?order spectral subtraction. On the basis, the adaptiveβ?orderspectral subtraction based on the iteration of spectral gain function is used to enhance speech spectrum. This algorithm is also applied to the enhancement of the output sub-band energy of CMSBS parameter extraction. Extensive experiments indicate the efficiency of the algorithm. In recognition model, with the study of principles and related algorithms, the research is mainly focused on the SGML algorithm which is used for GMM parameter estimation. The SGML algorithm not only searches the mixed-degree of GMM with self-splitting method but also improves the parameter estimation accuracy on the process of splitting. It is a good solution to the problems that the mixed-degree is difficult decided and EM algorithm is sensitive to the initial value. Furthermore, the performance of the location algorithms is verified on VC platform.

引文

[1]易克初,田斌,付强,语音信号处理,国防工业出版社,2000.
    [2] J.P.Marques著,吴逸飞译,模式识别-原理、方法及应用,清华大学出版社,2002.
    [3]陈念贻,模式识别优化技术及其应用,中国石化出版社,1997.
    [4]陈伯胜,基于VQ和GMM的与文本无关的说话人识别研究,重庆大学硕士学位论文,2007.
    [5]王书诏,基于高斯混合模型的说话人识别系统的研究,大连理工大学硕士学位论文,2006.
    [6]刘明辉,基于GMM和SVM的文本无关的说话人确认方法研究,中国科学技术大学博士学位论文,2007.
    [7]雷震春,支持向量机在说话人识别中的应用研究,浙江大学博士学位论文,2006.
    [8] W.M.Campbell, D.E.Sturim, D.A Reynolds, "Support Vector Machines Using GMM Supervectors for Speaker Verification", IEEE Signal Processing Letters, VOL.13, NO.5, May 2006.
    [9]蔡妍,语音信号端点检测方法的研究,江南大学硕士学位论文,2008.
    [10]王博,郭英,韩立峰,基于熵函数的语音端点检测算法研究,信号处理,第3期25卷,2009.
    [11]李金宝,屈百达,刘立星等,基于自适应子带频谱嫡的鲁棒性语音端点检测算法,烟台大学学报(自然科学与工程版),第19卷专集,2006.
    [12] Rong Gao-feng, Zhang Ling-hua, WU Xi-hong, "Speech Enhancement Based on Estimation of Priori SNR Using Iterative Spectral Gain Method", Journal of Nanjing University of Posts and Telecommunications Natural Science, Vo1.27, No.5, 0ct. 2007.
    [13]金学骥,语音增强算法的研究与实现,浙江大学硕士学位论文,2005.
    [14]许鑫,苏开娜,胡起秀,几种改进的MFCC特征提取方法在说话人识别中的应用,2005.
    [15] Wei-Wen Hung, Hsiao-Chuan Wang, "On the use of Weighted Filter Bank Analysis for the Derivation of Robust MFCC".Signal Processing Leters, IEEE, 2001, 8(3):70-73.
    [16]李萱,语音特征值提取方法研究,西安电子科技大学硕士学位论文,2006.
    [17] Babak Nasersharif, Ahmad Akbari, "SNR-dependent compression of enhancedMel sub-band energies for compensation of noise effects on MFCC features", Pattern Recognition Letters, Volume 28, Issue 11, 1 August 2007, Pages 1320-1326.
    [18] Marc Delcroix,Tomohiro Nakatani,Shinji Watanabe, "combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processorwith speech recognizer", IEEE, 2008.
    [19] Danoush Hosseinzadeh, Sridhar Krishnan, "On the use of complementary spectral features for speaker recognition ", EURASIP Journal on Advances in Signal Processing, Jan. 2008.
    [20]解焱陆,基于特征变换和分类的文本无关电话语音说话人识别研究,中国科学技术大学博士学位论文,2007.
    [21]秦伦明,与文本关的说话人确认技术研究与应用,北京交通大学硕士学位论文,2006.
    [22]王月,屈百达,李金宝等,一种改进的基于频带方差的端点检测算法,中国控制与决策学术年会论文集,2007.
    [23] Sunho Park, Seungjin Choi, "A constrained sequential EM algorithm for speech enhancement", Neural Networks, Volume 21, Issue 9, November 2008, Pages 1401-1409.
    [24] R.Manin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics". IEEE Trans. On Speech and Audio Processing, 2001, 9(5):504-512.
    [25] Israel Cohen, "On the decision-directed estimation a pproach of ephraim and malah",Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on Volume 1, 17-21, May 2004, Page(s):I - 293-6 vol.1.
    [26] Yang Lu, Philipos C.Loizou,"A geometric approach to spectral subtraction", Speech Communication 50 (2008) 453–466, 2008.
    [27] Radu Mihnea Udrea, Nicolae D.Vizireanu,Silviu Ciochina, "An improved spectral subtraction method for speech enhancement using a perceptual weighting filter ",Digital Signal Processing 18 (2008) 581–587, 2008.
    [28] Yu Takahashi, Tomoya Takatani, Keiichi Osako, "Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment",IEEE Transactions On Audio Speech, And Language Processing, Vol.17, NO. 4, May 2009.
    [29]温源,侯震,李明等,Mel刻度上非均匀分布滤波器组在MFCC参数提取中的应用,中科院声学所5室会议论文,2003.
    [30]李霄寒、戴蓓蓓等,“高阶MFCC的话者识别性能及其噪声鲁棒性”,<<信号处理>>,17(2), 2001.
    [31] Wei zhong Zhu,Douglas O’Shaughnessy, "Incorporating Frequency Masking Filtering in a Standard MFCC Feature Exrtaction Algorithm", Proc. 7th, International Conference on Signal Processing, ICSP 2004, Beijing, China: 617—620, 2004.
    [32] Qiang Wu, Liqing Zhang, Guangchuan Shi, "Robust Speech Feature Extraction Based On Gabor Filtering And Tensor Factorization", IEEE, 2009.
    [33] J. Chen, K. K. Paliwal, S. Nakamura, "Sub-Band Based Additive Noise Removal for Robust SpeechRecognition", European Conference on Speech, 2001.
    [34] Windmann, Haeb-Umbach, "Approaches to Iterative Speech Feature Enhancement and Recognition,"Audio, Speech, and Language Processing, IEEE Transactions on Volume 17, Issue 5, July 2009, Page(s):974– 984.
    [35] Jia Zeng, Lei Xie, Zhi-Qiang Liu, "Type-2 fuzzy Gaussian mixturemodels", Pattern Recognition, Volume 41, Issue 12, December 2008, Pages 3636-3643.
    [36] Jia Zeng; Lei Xie; Zhi-Qiang Liu, "Gaussian Mixture Models with Uncertain Parameters",Machine Learning and Cybernetics, IEEE International Conference on Volume 5, 19-22 , Aug 2007, Page(s):2761– 2766.
    [37] Jin Young Kim, So Hee Min, Seung You Na, Seung Ho Choi, "Optimization of observation membership function by particle swarm method for enhancing performances of speaker identification", Proceedings of the 6th Conference on 6th WSEAS International Conference on Signal Processing, Mar.2007.
    [38] Jun Cai, Ghazi Bouselmi, Yves Laprie, Jean-Paul Haton, "Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition",Computer Speech & Language, Volume 23, Issue 2, April 2009, Pages 147-164.
    [39] H. R. Sadegh Mohammadi and R. Saeidi, "Efficient implementation of GMM based speaker verification using sorted Gaussian mixture model," in Proc. EUSIPCO'06, Florence, Italy, Sept. 4-8, 2006.
    [40] R. Saeidi, T Ganchev, H. R. Sadegh Mohammadi, "Text Independent Speaker Verification using Enhanced Sorted Gaussian Mixture Model", IEEE International Conference on Signal Processing and Communications (ICSPC 2007), 24-27 November 2007.
    [41] Guorong Xuan , Wei Zhang, Peiqi Chai, "EM Algorithms Of Gaussian Mixture Model And Hidden Markov Model", IEEE, 2001.
    [42] Chee-Ming Ting; Salleh, S.-H., Tian-Swee Tan; Ariff, A.K, "Text independent Speaker Identification using Gaussian mixture model",Intelligent and Advanced Systems, 2007. ICIAS 2007. IEEE International Conference on 25-28 Nov, 2007, Page(s):194– 198.
    [43] S. J. Young and P. C. Woodland,“State clustering in hidden Markov model-based continuous speech recognition,”Computer Speech & Language, 1994, vol. 8, no. 4, pp. 369–384.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700