耳语音说话人识别的研究

英文题名：Research on Whispering Speaker Recognition
作者：丁国梁
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：耳语音 ; 说话人识别 ; MFCC ; 高斯混合模型
英文关键词：Whispered Speech ; Speaker Recognition ; MFCC ; Gaussian Mixture Model
学位年度：2009
导师：赵鹤鸣
学科代码：081002
学位授予单位：苏州大学
论文提交日期：2009-05-01

摘要

耳语音说话人识别是指根据包含在耳语音中的同说话人有关的信息来自动识别说话人,可以应用于电话银行、特殊场合的身份确认、公众场合下的通讯和国家安全的某些特殊需要等方面。它是一个较新的课题,有许多问题有待解决。
     因为耳语音发音方式与正常音不同,所以两者在说话人识别上有着很大的差异。本文建立了基于GMM模型的说话人识别系统,通过研究文本无关的说话人辨认,比较了耳语音和正常音的区别并通过特征的修正优化了耳语音说话人识别系统。本文的工作主要体现在以下方面:
     建立了22人的耳语音库和正常语音库,使用Mel倒谱系数(MFCC)、线性预测倒谱系数(LPCC)、差分Mel倒谱系数(ΔMFCC)、差分线性预测系数(ΔLPCC)和组合特征MFCC+LPCC作为特征参数,比较了正常音和耳语音的说话人识别效果。
     利用耳语音库和正常语音库,本文比较了MFCC维数的变化对正常音和耳语音的说话人识别的影响。实验中正常音的说话人识别率在16维最高,而耳语音的说话人识别率在50维最高。
     提出了一种MFCC的改进方法,分频段完成滤波器组的设计。将滤波器组的设计任务分配给各频段独立完成,使改进后的MFCC能更好的表现信号的局部频率特性。实验表明,改进后的MFCC可以有效地提升耳语音说话人识别系统的性能。
Whispering speaker recognition is to recognize the speaker according to the speaker-related information in whispered speech, it can be applied in several fields, such as telephone banking, identification in special condition, the private speech in public, the special need for nation security, etc. It is such a new subject that many problems were left to fix.
     As the whispered speech is pronounced in different style from the normal speech, its performance in speaker recognition is quite different from normal speech too. This thesis describe the process of building an auto speaker recognition system based on Gaussian Mixture Model, this system was used to do the research on text independent speaker identification, and the difference of performance between whispered speech and normal speech is analyzed , an improvement to the whispering speaker recognition system was also made. The main work of this thesis is shown below:
     A whispered speech library and a normal speech library were recorded, pronounced by 22 people, the following features were used for speaker recognition of both normal speech and whispered speech: MFCC、LPCC、ΔMFCC、ΔLPCCand MFCC +LPCC.
     The performance of normal speaker recognition and whispering speaker recognition under different dimensions of MFCC was tested, the normal speech came to a best performance at the dimension 16, while the whispered had to use dimension 50 to reach the best performance.
     An improvement for MFCC is proposed: divide the whole frequency domain into several parts, and make the design of MFCC filters separately in each part. As such an improvement makes the design of each frequency band into independent process, the improved MFCC is more suitable for performing the small frequency characteristics of signal. The improved MFCC is effective to improve the performance of whispering speaker recognition, and this has been confirmed by experiments.

引文

[1]. R.L特拉斯克.《语音学和音系学字典》(A dictionary of Phonetics and Phonology),《语音学和音系学字典》编译组译,语文出版社,2000:286.
    [2]. R.W.Morris. Enhancement and recognition of whispered speech. Georgia Institute of Technology, USA, 2003.
    [3]. R.W.Morris, M.A.Clements. Reconstruction of speech from whispers. Medical Engineering & Physics, 2002, 24(8): 515-520.
    [4]. M.Matsuda, H.Kasuya. Acoustic nature of the whisper. Sixth European Conference on Speech Communication and Technology, 1999: 137-140.
    [5]. T.Itoh, K.Takeda, F. Itakura. Acoustic analysis and recognition of whispered speech . ICASSP, 2002: 389-392.
    [6]. http:// www.ed.ac.uk.
    [7]. M.GAO. Tones in whispered Chinese: Articulatory and perceptual cues. University of Victoria, Canada, 2002.
    [8]. http:// www.britac.ac.uk/funding/index.html.
    [9].杨莉莉,李燕,徐柏龄.汉语耳语音库的建立与听觉实验研究.南京大学学报(自然科学版),2005,41(3): 311-317.
    [10].樊星,卢晶,徐柏龄.汉语耳语音转换为正常音的研究.电声技术,2005,12: 44-47.
    [11].林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的话者识别.南京大学学报(自然科学版),2006,42(1): 54-62.
    [12].沙丹青,栗学丽,徐柏龄.耳语音声调特征的研究.电声技术,2003,(11): 4-7.
    [13]. C.Xueqin, Z.Heming. The research of endpoint detection and initial/final segmentation for Chinese whispered speech. Signal Processing, the 8th International Conference. 2006, 1: 16-20.
    [14]. G..Chenghui, Z.Heming, L.Gang, et al. An algorithm for formant estimation ofwhispered speech. Signal Processing, the 8th International Conference. 2006,1, 16-20.
    [15].赵立.语音信号处理.北京:机械工业出版社, 2003.
    [16].林玮.汉语耳语音话者识别研究.博士论文,南京大学电子科学与工程系声学研究所,2006.
    [17]. Douglas A .Reynolds. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication. 1995, 17(1): 95-108.
    [18].粟学丽.汉语耳语音转换为正常音的研究.博士论文,南京大学电子科学与工程系声学研究所,2004.
    [19]. Q.Jin, S.Chen, S.Jou, T.Schultz. Whispering Speaker Identification . proceedings of International Conference on Multimedia & Expo (ICME), Beijing, P.R.China, 2007: 1027-1030.
    [20]. H.Ezaidi, J.Rouat. Towards combining pitch and MFCC for speaker identication system. Eurospeech, 2001.
    [21].易克初,田斌,付强编著.语音信号处理.北京:国防工业出版社. 2000: 18-23.
    [22]. T.F.Quatieri著,赵胜辉等译.离散时间语音信号处理—原理与应用.北京:电子工业出版社,2004: 44.
    [23]. K. Tsudado,Y.Ohta,Y.Soda, et al. Laryngeal adjustment in whispering: magnetic resonance imaging study. Ann Otol Rhinol Laryngol, 1997, 106: 41-43.
    [24].林焘,王理嘉著.语音学教程.北京大学出版社, 1992.
    [25]. K.Tsunoda, S.Niimi, H.Hirose. The roles of posterior cricoarytenoid and thyropharygeus muscles in whispered speech. Folia Phoniatr Logop. 1994, 46(3): 139-151.
    [26]. V.C.Tartter. What’s in a whisper?. Journal of Acoustical Society of American. 1989, 86(5) : 1678-1683.
    [27]. S.T.Jovicic. Formant feature differences between whispered and voiced sustained vowels. Acustica-acta acustica. 1998, 84(4) : 739-743.
    [28]. K.J.Kallail, F.W.Emanuel. The identifiability of isolated whispered and phonatedvowel samples. Journal of Phonetics, 1985, 13 : 11-17.
    [29]. M.Higashikawa, K.Nakai, A.Sakakura, H.Takahashi. Perceived pitch of whispered vowels-relationship with formant frequencies: a preliminary study. Journal of Voice, 1996, 10(2) : 155-158.
    [30].王炳锡,屈丹,彭煊.实用语音识别基础.北京:国防工业出版社,2005.
    [31].赵胜辉等.离散时间语音信号处理.北京:电子工业出版社,2004.
    [32]. S.B.Davies and P.Mermelstein. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences. ASSP, 1980, 28 (4): 357-366.
    [33]. T.F.Quatieri, R.B.Dunn, D.A.Reynolds. On the influence of rate, pitch, and spectrum on automatic speaker recognition performance. ICSLP, 2000, 2: 491-494.
    [34].林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的话者识别.南京大学学报(自然科学), 2006, 42(1) :54-62.
    [35]. S.E.Bou-Ghazale, J.H.L.Hansen. A comparative study of traditional and newly proposed features for recognition of Speech under stress. IEEE Trans. Speech Audio Process, July 2000, 8(4) :429-442.
    [36]. L.M.Arslan, J.H.L.Hansen. Frequency characteristics of foreign accented speech. ICASSP, 1997, 2 : 1123-1127.
    [37]. Z.Fang, Z.Guoliang, S.Zhanjiang. Comparison of different implementations of MFCC. Computer Science & Technology, Sep 2001, 16(6): 582-589.
    [38]. M.D.Skowronski, J.G.Harris. Increased MFCC filter bandwidth for noise-robust phoneme recognition. Acoustics, Speech, and Signal Processing, 2002, 1: 801-804.
    [39]. V. Tyagi and C. Wellekens. On desensitizing the Mel-Cepstrum to spurious spectral components for robust speech recognition. ICASSP, 2005, 1: 529-532.
    [40].温源,侯震,李明,王之禹,俞铁城. Mel刻度上非均匀分布滤波器组在MFCC参数提取中的应用.第六届全国人机语音通讯学术会议. 2001: 9-11.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700