说话人辨认中的特征参数提取和鲁棒性技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音是人类获取信息的主要来源之一,也是最方便、最有效、最自然的交流工具。语音识别是研究使机器能准确地听出人的语音内容的问题,其目的是方便人与机器的交流。说话人识别技术是语音识别的一种特殊形式,其目的不是识别说话人讲的内容,而是识别说话人是谁。说话人识别技术在近三十多年的时间里取得了很大的进步,这种技术的应用为人类的日常生活带来很大的便利。但是,随着说话人识别技术实用化的不断深入,不同应用领域对该技术的要求越来越高。一方面,说话人发音的多变性,要求提取适合说话人识别的特征以保证系统的性能;另一方面,噪声环境、训练与测试数据的时长以及通信信道的失真等问题都严重影响到说话人识别系统在实际应用中的性能。本论文针对文本无关的说话人辨认任务,在说话人个性特征提取和噪声鲁棒性技术两个方面进行了研究,主要内容包括:
     1.提出基于特征变换和模糊最小二乘支持向量机的辨认算法。针对最小二乘支持向量机模型在语音数据大样本输入下的局限性,一方面对传统的梅尔倒谱特征MFCC进行基于高斯混合模型的特征变换,解决训练最小二乘支持向量机的过程中需要求解的线性方程组的变量数目与特征数量紧密相关的问题;另一方面,通过引入模糊隶属度函数,处理了最小二乘支持向量机从二分类扩展到说话人辨认的多分类时存在的不可分数据问题。高斯混合模型作为一种经典的生成式模型,不但能有效减少数据量,起到压缩数据的作用,而且由于聚类变换后的结果是高斯混合模型的均值矢量集,能够很好地代表说话人的特征,起到突出说话人信息的作用。基于特征变换和模糊最小二乘支持向量机的辨认算法结合了高斯混合模型在拟合数据方面的优势和最小二乘支持向量机在分类辨别方面的优势,从而改善系统系统的性能。
     2.提出基于高斯混合模型的感知特征补偿变换的抗噪声算法。从人类听觉感知特性出发,基于感知线性预测模型从不同层次模拟了人耳的听觉特性,从语音的频谱细节考虑,去除了会引起说话人信息平滑的临界带频谱分析,提取改进的感知对数面积比系数MPLAR作为说话人特征,具有良好的可分性;并在此基础上,根据说话人识别的声学特性,从匹配得分的整体考虑,对模型输出的似然得分引入非线性变换,拉大目标模型与非目标模型的得分比,拉近同一模型各帧得分值,使得各模型的得分值不仅与当前时刻的似然概率有关,还与之前的K个时刻的似然概率有关,解决了MPLAR在不同类型噪声条件下的抗噪性能问题。基于感知特征和模型补偿的说话人辨认算法不仅提供了可分性更好的特征,并且在模型匹配阶段从整体得分的统计特性出发,得到稳定的模型得分,增强了系统在噪声环境下的识别能力。
     3.提出基于自适应频率规整的鲁棒性辨认算法。经典的梅尔倒谱特征和感知线性预测特征从人类的听觉感知机理出发,模拟了人类听觉系统对声音频率的感知特性,改进了说话人的识别性能,但是这种处理方式并没有对语义特征和说话人个性特征区别对待,而是在特征提取阶段笼统地降低了高频信息的比重。自适应频率规整算法是基于说话人信息在不同频带呈不均匀分布的原理,从语音生成的生理学角度分析人类在发音过程中的结构变化,从中获取携带说话人信息的生理特征,进而从频谱分析的层次对不同频带对说话人信息的贡献进行量化,指导设计了与Mel频率尺度不同的自适应频率尺度变换,在说话人信息贡献大的区域分配的滤波器个数增多,带宽变小,频率分辨率提高,而贡献小的区域分配的滤波器个数减少,带宽变大,频率分辨率降低,从而进行自适应的频谱滤波,提取区分性特征DFCC。并且针对应用到实际使用环境时存在的训练语音与测试语音失配的问题,对语音频谱进行逐帧逐频率点的预增强处理,去除噪声的干扰,进一步提高系统的鲁棒性。
     4.提出基于汉语元音映射的说话人辨认方法。该方法从汉语语音的特点出发,对基于汉语的说话人识别进行研究。由于汉语具有相对稳定的音节结构,并且其中的元音部分占据了主要的能量和时长,基于此,从汉语语音的特点出发,对汉语拼音的结构、发音特点进行分析,并且通过元音频谱对比、音素滑动分析、韵母分解实验和共振峰分析等,从短时帧角度将韵母中的元音部分分解为单元音音素的组合,结合大量语音学知识构建了汉语元音映射表,通过汉语元音映射,能够有效地分离语音信号中的语义信息和话者身份信息,将文本无关的说话人识别问题转化为与有限个单元音音素有关的识别问题,并由此衍生出新的说话人建模方法以及新的识别框架,在提高识别率的同时降低对训练和测试数据时长的依赖。在新的识别框架下,提出了一种基于仿生模式识别的说话人辨认算法,在训练阶段利用改进的最近邻覆盖算法为每个单元音音素建立有效的覆盖;在识别阶段根据待测元音帧是否落入对应覆盖区域进行判别,该算法在开集测试条件下对冒名者具有较好的分辨能力。
Speech is the major source of acquiring information for people, and it is also the most convenient, effective and natural communication tool. Speech recognition is to identify speech contens, the purpose of which is to facilite the exchange of people and machines. Speaker recognition is a special form of speech recognition, which is the use of a machine to recognize a person from a spoken phrase. Speaker recognition technology has made great progress in the near thirty years, at the same time, along with the development of different practical applications, it requires higher performance. On the one hand, the speaker pronunciation variability made that extracting discriminative feature become the key factor of ensuring the system performance. On the other hand, many disturbance factors, such as noise environment, the length of training and testing data and the mismatch of communication channel, seriously degrade the performance of speaker recognition in the practical application. This dissertation focuses on the text-independent speaker identification, including the extraction of speaker characteristic and noise robust. The main research results include four aspects:
     1.A speaker identification algorithm based on feature transformation and fuzzy least-squares support vector machine is presented to solve the limitation of least-squares support vector machine with large sample of speech data. During the solving process of least-squares support vector machine, it needs to solve a set of linear equations with the number of variables equal to the number of training data, then this paper proposes a method of feature transformation based on Gaussian mixture model.Simultaneously this paper introduces fuzzy membership function into least-support vector machine, which deal with the unclassifiable regions for multi-class problem. GMM is a classical generative model, which can effectively reduce the amount of feature data, and highlight the speaker characteristic owing to that the clustering result is Gaussian mean vectors.The proposed algorithm combines the advantages of generative model and discriminative model.Experimental results demonstrate that fuzzy least-squares support vector machine has better discriminative ability and generalization ability.
     2.A noise robust method of perceptual feature compensation transformation based on Gaussian mixture model is proposed. From the analysis of human auditory perception, the model of perceptual linear prediction has taken three steps to reflect the human perception of sound. In this paper, it modifies the PLP in the phase of feature extraction via removing the process of critical band spectral resolution analysis, then extracts modified perceptual log area ratio. Furthermore, according to the acoustic characteristic of speaker recognition, it adopts nonlinear transformation for the output likelihood scores, which can widen the score ratio between target model and non-target model, and keep frames'score for the same model close considering the whole distribution of scores.This means that each model score is not only relevant with current likelihood score, but also relevant with the prior K frames'score, which can overcome the limitation of robustness stability under different noise environments for MPLAR feature. The method based on perceptually feature and model compensation can provide discriminative feature, stable the model scores and improve the recognition rate and robustness for recognition system.
     3.A robust algorithm based on self-adaptive frequency warping is introduced. Although considering the characteristic of human auditory perception and improve the performance of recognition system to some extent, the Mel frequency feature and perceptual linear prediction feature can't treat the semantic information and personality characteristic differently, and pay no attention to high frequency information. This paper presents a new discriminative feature based on adaptive frequency warping. We analyze the relationship between frequency components and individual characteristics and quantify this dependency. This new feature is extracted by non-uniform sub-band filters designed according to the adaptive frequency warping in different frequency bands. Furthermore, we adopt pre-enhancement prior to feature extraction module. Using a series of controlled experiments, it is shown that the warping algorithm is reasonable and understandable, and the proposed feature is insensitive to spoken content and thus more discriminative and robust. The experimental results demonstrate that combining pre-enhancement and proposed feature leads to noticeable improvement on speaker recognition rate and robustness.
     4.A novel framework of speaker recognition based on Chinese vowel mapping technique is proposed. The base of this framework is the decomposition of Chinese multi-vowel with single-vowel phonemes.In Chinese pronunciation, all syllables have a simple and stable phonetic structure, and the including vowel part holds the main emergy and duration. We find out that the diphthong and multi-vowel in Chinese can approximately be considered as the complex of vowel and transitional part in point of short-term analysis and built up a new Chinese vowel mapping table from multi-vowel to single-vowel phoneme. Based on this mapping table, we succeed in separating personal identification information from semantic information, which is a novel way to transform the text-independent system into text-dependent speaker recognition system and be reusable by industrials or other researchers. In the new framework, we propose a new Chinese speaker identification system based on biomimetic pattern recognition and improve the nearest neighbor algorithm to find the effective cover of each phoneme in the eigen-space for every speaker. During the identification phase, the final decision will be made according to the relation between the cover and the feature characteristic. Experimental results demonstrate that the Chinese vowel mapping theory is valid and meaningful, and the new system can effectively reduce the requirement of data amount and avoid the disturbance of impostors.
引文
[1]S.Furui. An overview of speaker recognition technology. Automatic Speech and Speaker Recognition. Kluwer Academic Publishers.1996,31-56
    [2]JP. Campbell, JR. Speaker recognition:A Tutorial. Proceedings of the IEEE.1997,85(9): 1437-1462
    [3]S.Furui.Recent advances in speaker recognition. Pattern Recognition Letters.1997, 18(9):859-872
    [4]S.Pruzansky. Pattern-matching procedure for automatic talker recognition. Journal of the Acoustical Society of America.1963,35(3):354-358
    [5]JE.Luck. Automatic speaker verification using cepstral measurements.Journal of the Acoustical Society of America.1969,46(4B):1026-1032
    [6]BS.Atal.Automatic recognition of speakers from their voices. Proceedings of the IEEE. 1976,64(4):460-475
    [7]D.Reynolds, RC.Rose. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing.1995, 3(1):72-83
    [8]M.Schmidt, H. Gish. Speaker identification via Support Vector classifiers.Proceedings of the 1996 IEEE International Conference on Acoustics, Speech and Signal Processing. 1996,1:105-108
    [9]V. Wan, WM.Campbell. Support vector machines for speaker verification and identification. Proceedings of the 2000 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing.2000,2:775-784
    [10]D.Reynolds, TF. Quatieri, RB.Dunn. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing.2000,10(1):19-41
    [11]Ville Hautamaki, Tomi Kinnunen, Pasi Franti.Text-independent speaker recognition using graph matching. Pattern Recognition Letters.2008,29(9):1427-1432
    [12]F. Weber, B.Peskin, M. Newman, et al.Speaker recognition on single-and multispeaker data. Digital Signal Processing.2000,10(1):75-92
    [13]S.Fine, J. Navratil, RA.Gopinath. A hybrid GMM/SVM approach to speaker identification. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech and Signal Processing.2001,1:417-420
    [14]S.Fine, J. Navratil, RA. Gopinath. Enhancing GMM scores using SVM "hints". Proceedings of the 7th European Conference on Speech Communication and Technology. 2001:1757-1760
    [15]Minqiang Xu, Beiqiang Dai, Dongxing Xu, et al.SVM-based text-independent speaker verification using derivative kernel in the reference GMM space. Proceedings of the 2008 International Symposiums on Information Processing.2008:422-425
    [16]F. Weber, L. Manganaro, B.Peskin, et al.Using prosodic and lexical information for speaker identification. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech and Signal Processing.2002,1:141-144
    [17]Panagiotis Moschonas, Constantine Kotropoulos. Multimodal speaker identification based on text and speech. Lectures Notes in Computer Science.2008,5372:100-109
    [18]Yernez. Y, Kanak. A, Erzin. E. Multimodal speaker identification with audio-video processing. Proceedings of the 2003 International Conference on Image Processing. 2003,3:5-8
    [19]R. Teunen, B.Shahshahani, L. Heck. A model-based transformational approach to robust speaker recognition. Proceedings of 6~(th) International Conference on Spoken Language Processing.2000,2:495-498
    [20]D.Reynolds.An overview of automatic speaker recognition technology. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech and Signal Processing. 2002,4:4072-4075
    [21]M. Faundez Zanuy, E. Monte Moreno.State-of-the-art in speaker recognition. IEEE Aerospace and Electronic Systems Magazine.2005,20(5):7-12
    [22]B.Beek. Automatic speaker recognition system. Rome Air Development Center Report. 1971
    [23]S. Furui.50 years of progress in speech and speaker recognition. Proceedings of the 10~(th) International Conference on Speech and Computer.2005:1-9
    [24]赵力.语音信号处理.北京:机械工业出版社.2004
    [25]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用.北京:机械工业出版社.2003
    [26]易克初,田斌,付强.语音信号处理.北京:国防工业出版社.2000
    [27]Molau. S,Pitz. M, Schluter, R, et al.Computing Mel-frequency cepstral coefficients on the power spectrum. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech and Signal Processing.2001,1:73-76
    [28]H. Hermansky. Perceptual linear predictive analysis of speech. Journal of the Acoustical Society of America.1990,87(4):1738-1752
    [29]S.Furui.Comparison of speaker recognition methods using static features and dynamic features.IEEE Transactions on Acoustic, Speech and Signal Processing.1981,29(3): 342-350
    [30]D. Reynolds, W. Campbell, TT. Gleason, et al.The 2004 MIT Lincoln laboratory speaker recognition system. Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing.2005,1:177-180
    [31]邓菁,郑方,刘建,等.Mel子带谱质心和高斯混合相关性在鲁棒话者识别中的应用.声学学报.2006,31(5):471-475
    [32]AG. Adami.Prosodic modeling for speaker recognition based on sub-band energy temporal trajectories. Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing.2005,1:189-192
    [33]A. Mijail, A. Anil, Z. Philip.A Bayesian network approach combining pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition. Proceedings of the 2005 INTERSPEECH.2005,2009-2012
    [34]S.Chakroborty, A. Roy, G. Saha. Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks. International Journal of Signal Processing.2007,4(2):114-121
    [35]岳喜才,叶大田.文本无关的说话人识别:综述.模式识别与人工智能.2001,14(2):194-200
    [36]RP. Ramachandran, KR. Farrell, et al. Speaker recognition-general classifier approaches and data fusion methods. Pattern Recognition.2002,35(12):2801-2821
    [37]J. Hertz, A. Krogh, RG. Palmer. Introduction to the theory of neural computation. Santa Fe Institute Studies in the Sciences of Complexity, Addison-Wesley Publishing Company.1991
    [38]VN.Vapnik. The nature of statistical learning theory. Springer-Verlag.1995,Second Edition.
    [39]H. Sakoe, S.Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustic, Speech and Signal Processing.1978,26(1): 43-49
    [40]AL. Higgins, LG. Bahler, JE. Porter. Voice identification using nearest neighbor distance measure.Proceedings of the 1993 IEEE International Conference on Acoustics, Speech and Signal Processing.1993:375-378
    [41]FK. Soong, AE. Rosenberg, LR. Rabiner, et al.A vector quantization approach to speaker recognition. Proceedings of the 1985 IEEE International Conference on Acoustics, Speech and Signal Processing.1985:387-390
    [42]刘鸣,戴蓓倩,李辉等.鲁棒性话者辨识中的一种改进的马尔科夫模型.电子学报.2002,30(1):46-48
    [43]Dalei Wu, Ji Li, Haiqing Wu.α-Gaussian Mixture Modelling for speaker recognition. Pattern Recognition Letters.2009,30:589-594
    [44]D.Reynolds, B.Carlson. Text-dependent speaker verification using decoupled and integrated speaker and speech recognizers.Proceedings of the 1995 IEEE International Conference on Acoustics, Speech and Signal Processing.1995:647-650
    [45]D.Reynolds.Comparison of background normalization methods for text-independent speaker verification. Proceedings of the 5~(th) European Conference on Speech Communication and Technology.1997:963-966
    [46]朱磊,江杰,郑榕等.一种快速说话人搜索算法.中文信息学报.2008,22(2):60-63
    [47]X. Zhenyu, Z. Thomasfang, S.Zhanjiang. A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification. Speech Communication.2006,48:1273-1282
    [48]熊振宇.大规模、开集、文本无关说话人辨认研究.北京:清华大学,计算机科学与技术.2005
    [49]Y. Bennani,P. Gallinari.On the use of TDNN-extracted features information in talker identification. Proceedings of the 1991 IEEE International Conference on Acoustics, Speech and Signal Processing.1991:385-388
    [50]KP. Farrell, RJ. Mammone, KT. Assaleh. Speaker recognition using neural networks and conventional classifiers. IEEE Transactions on Acoustic, Speech and Signal Processing. 1994,2:194-205
    [51]王炳锡,屈丹,彭煊等.实用语音识别基础.北京:国防工业出版社.2005
    [52]CJC.Burges. A Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery.1998,2(2):121-167
    [53]张学工.关于统计学习理论与支持向量机.自动化学报.2000,26(1):32-42
    [54]SY. Sun, CL. Tseng. Cluster-based support vector machines in text-independent speaker identification. Proceedings of IEEE International Joint Conference on Neural Networks. 2004:729-734
    [55]JAK. Suykens, J. Vandewalle.Least squares support vector machine classifiers.Neural Processing Letters.1999,9(3):293-300
    [56]刘明辉.基于GMM和SVM的文本无关的说话人确认方法研究.合肥:中国科学技术大学,信号与信息处理.2007
    [57]C.Longworth, MJF. Gales.Multiple kernel learning for speaker verification. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.2008:1581-1584
    [58]Chang Huai You, Kong Aik Lee, Haizhou Li. An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters.2009,16(1):49-52
    [59]Andreas Stolcke, Sachin Kajarekar, Luciana Ferrer. Nonparametric feature normalization for SVM-based speaker verification. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.2008:1577-1580
    [60]Xiang Zhang, Qingwei Zhao, Yonghong Yan. SVM based speaker recognition using maximum a posteriori linear regression. Proceedings of the 2009 International Conference on Electronic Computer Technology.2009:438-442
    [61]侯风雷,王炳锡.基于支持向量机的说话人辨认研究.通信学报.2002,23(6):61-67
    [62]PJ. Moreno, PP. Ho.A new approach to speaker identification and verification using probabilistic distance kernels. Proceedings of the 8th European Conference on Speech Communication and Technology.2003:2965-2968
    [63]DJ.Mashao. A hybrid GMM-SVM speaker identification system. Proceedings of the 7th AFRICON Conference in Africa.2004,1(1):319-322
    [64]WM. Campbell, JP. Campbell, D.Reynolds, et al.Support vector machines for speaker and language recognition. Computer Speech and Language.2006,20(2):210-229
    [65]FK. Soong, AE. Rosenberg. On the use of instantaneous and transitional spectral information speaker recognition. IEEE Transactions on Acoustic, Speech and Signal Processing.1998,36(6):871-879
    [66]白俊梅,张世磊,张树武,等.噪声环境下的鲁棒性说话人识别.中文信息学报.2006,20(1):91-97
    [67]SRM. Prasanna, CS.Gupta, B.Yegnanarayana. Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication.2006,48: 1243-1261
    [68]KSR. Murty, B.Yegnanarayana. Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters.2006,13(1):52-55
    [69]M. Chetouani, M. Faundez-Zanuy, B.Gas, et al. Investigation on LP-residual representation for speaker identification. Pattern Recognition.2009,42(3):487-494
    [70]T; Thiruvaran, M. Nosratiqhods, E.Ambikairajah, et al.Computationally efficient frame-averaged FM feature extraction for speaker recognition. Electronics Letters.2009, 45(6):335-337
    [71]甄斌,吴玺宏.语音识别和说话人识别中各倒谱分量的相对重要性.北京大学学报:自然科学版.2001,37(3):371-378
    [72]WH. Abdulla. Robust speaker modeling using perceptually motivated feature. Pattern Recognition Letters.2007,28:1333-1342
    [73]S.Hayakawa, F. Itakura. Text-dependent speaker recognition using the information in the higher frequency band. Proceedings of the 1994 IEEE International Conference on Acoustics, Speech and Signal Processing.1994,1:137-140
    [74]OD.Orman, LM. Arslan L. Frequency analysis of speaker identification. Proceedings of 2001 Speaker Odyssey-the Speaker Recognition Workshop.2001:219-222
    [75]C.Miyajima, H.Watanable, K. Tokuda, et al.A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction. Speech Communication.2001,35(3):203-218
    [76]X. Lu, J. Dang. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Communication.2008,50(4):312-322
    [77]AG. Adami.Modeling prosodic differences for speaker recognition. Speech Communication.2007,49(4):277-291
    [78]L.Mary, B.Yegnanarayana. Extraction and representation of prosodic features for language and speaker recognition. Speech Communication.2008,50(10):782-796
    [79]CCT. Chen, CT. Chen, CK. Hou. Speaker identification using hybrid Karhunen-Loeve transform and Gaussian Mixture model approach. Pattern Recognition.2004,37(5): 1073-1075
    [80]SF. Boll.Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustic, Speech and Signal Processing.1979,27(2),113-120
    [81]M. Berouti, R. Schwartz, J. Makhoul, et al.Enhancement of speech corrupted by acoustic noise. Proceedings of the 1979 IEEE International Conference on Acoustics, Speech and Signal Processing.1979,4:208-211
    [82]田斌,易克初.一种用于强噪声环境下语音识别的含噪Lombard及Loud语音补偿方法.声学学报.2003,28(1):28-32
    [83]P. Lockwood, J. Boudy. Experiments with a Nonlinear Spectral Subtractor(NSS), hidden Markov models and the projection of robust speech recognition in cars. Speech Communication.1992,11(2):215-228
    [84]J. Poruba. Speech enhancement based on nonlinear spectral subtraction. Proceedings of the 4th IEEE International Caracas Conference on Devices, Circuits and Systems.2002, T103:1-4
    [85]H. Hermansky, N.Morgan. RASTA Processing of speech. IEEE Transactions on Speech and Audio Processing.1994,2(4):578-589
    [86]吕成国,王承发,李俊庆,等.RASTA-PLP技术与谱减法相结合的去噪方法.自动化学报.2000,26(5):717-720
    [87]Z. Bin, W. Xihong, L. Zhimin, et al. An enhanced RASTA processing for speech signal. Chinese Journal of Acoustics,2001,26(3):252-258
    [88]A. Kocsor, L.Toth, A. Kuba, et al.A comparative study of several feature transformation and learning methods for phoneme classification. International Journal of Speech Technology.2000,3(3):263-276
    [89]G. Saon, M.Padmanabhan, R. Gopinath, etal.Maximum likelihood discriminant feature spaces. Proceedings of the 2000 IEEE International Conference on Acoustics, Speech and Signal Processing.2000,2(2):1129-1132
    [90]Burget. L, Matejka. P, Schwarz. P, et al.Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Transactions on Audio, Speech, and Language Processing.2007,15(7):1979-1986
    [91]M.Skosan. Histogram equalization for robust text-independent speaker verification in telephone environments.Cape Town:University of Cape Town, Electrical Engineering, 2005
    [92]MJF. Gales, SJ. Young. Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing.1996,4(5):352-359
    [93]J. Pelecanos, S.Sridharan. Feature warping for robust speaker verification. Proceedings of 2001 Speaker Odyssey-the Speaker Recognition Workshop.2001:213-218
    [94]D.Reynolds. Channel robust speaker verification via feature mapping. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech and Signal Processing. 2003,2:53-56
    [95]郭武,戴礼荣,王仁华.采用主成分分析的特征映射.自动化学报.2008,34(8):876-879
    [96]R. Teunen, B.Shahshahani.L. Heck. A model based transformational approach to robust speaker recognition. Proceedings of 6th International Conference on Spoken Language Processing.2000,2:495-498
    [97]周静芳,陈一宁,刘加,等.说话人识别信道补偿技术HNSSM.清华大学学报:自然科学版.2004,24(7):942-945
    [98]RJ. Vogt, S.Sridharan. Experiments in session variability modeling for speaker verification. Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing.2006:897-900
    [99]P. Kenny, G. Boulianne, P. Ouellet, et al. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing. 2007,15(4):1435-1447
    [100]W. Campbell, D.Sturim, D.Reynolds, et al.SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing.2006,1: 97-100
    [101]Xianyu Zhao, Yuan Dong, Hao Yang, et al.Nonlinear kernel nuisance attribute projection for speaker verification. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.2008:4125-4128
    [102]Shouchun Yin, Richard Rose, Patrick Kenny. Adaptive score normalization for progressive model adaptation in text-independent speaker verification. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. 2008:4857-4860
    [103]陈皓,付中华,赵荣椿.说话人确认中针对语音编码差异的似然比得分补偿方法.西北工业大学学报.2005,23(4):534-537
    [104]W. Campbell, D.Sturim, D.Reynolds.Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters.2006,13(5): 308-311
    [105]D.Tsujinishi, S.Abe.Fuzzy least squares support vector machines for multiclass problems.Neural Networks.2003,16(5):785-792
    [106]包永强,赵力,邹采荣.采用归一化补偿变换的文本无关的说话人识别.声学学报.2006,31(1):55-60
    [107]A. Acero.Acoustical and environment robustness in automatic speech recognition. Boston:Kluwer Academic Publishers,1993
    [108]MJF. Gales.Model based techniques for noise robust speech recognition. Cambridge University,1995
    [109]S.Sharma, D.Ellis, S.Kajarekar, et al.Feature extraction using non-linear transformation for robust speech recognition on the aurora database.Proceedings of the 2000 IEEE International Conference on Acoustics, Speech and Signal Processing.2000, 2:1117-1120
    [110]Pandey PC, Bhandorkar SM, Bachher GK, et al.Enhancement of alaryngeal speech using spectral substraction. Proceedings of the 14th International Conference on Digital Signal Processing.2002:591-594
    [111]Y. Soon, SN. Koh. Speech enhancement using 2-D fourier transform. IEEE Transactions on Speech and Audio Processing.2003,11(6):717-724
    [112]C.Tadj,M. Gabrea, C.Gargour et al.Towards robustness in speaker verification: enhancement and adaptation. Proceedings of the 45th Midwest Symposium on Circuits and Systems.2002,3:320-323
    [113]田滨,曹志刚.帧间约束MMSE语音增强算法.电子学报.1995,23(9):12-18
    [114]黄磊,吴顺君,张林让,等.快速子空间分解方法及其维数的快速估计.电子学报.2005,33(6):977-981
    [115]于鹏,徐义芳,曹志刚.基于加权特征值补偿的说话人识别.信号处理.2002,18(6):513-517
    [116]Viikki Olli, Laurila Kari. Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization. Proceedings of ESCA NATO Workshop on Robust Speech Recognition for Unknown Communication Channels.1997:107-110
    [117]陶智,赵鹤鸣,龚呈卉.基于听觉掩蔽效应和Bark子波变换的语音增强.声学学报.2005,30(4):367-372
    [118]BA. Carlson, MA. Clements. A projection-based likelihood measure for speech recognition in noise. IEEE Transactions on Speech and Audio Processing.1994,2: 97-102
    [119]包永强,赵力,邹采荣.采用归一化补偿变换的与文本无关的说话人识别.声学学报.2006,31(1):55-60
    [120]Douglas O'Shaughnessy. Automatic speech recognition:History, methods and challenges. Pattern Recognition.2008,41:2965-2979
    [121]D.Chow, WH. Abdulla. Speaker identification based on log area ratio and Gaussian mixture models in narrow-band speech. Lecture Notes in Computer Science.2004: 901-908
    [122]WH.Abdulla. Robust speaker modeling using perceptually motivated feature.Pattern Recognition Letters.2007,28:1333-1342
    [123]K. Chen. Towards better making a decision in speaker verification. Pattern recognition. 2003,36(2):329-346
    [124]T. Matsui, S.Furui. Likelihood normalization for speaker verification using a phoneme-and speaker-independent model.Speech Communication.1995,17(1): 109-116
    [125]A. Varga, HJM. Steeneken, M. Tomlinson, et al.The NOISEX-92 study on the effect of addictive noise on automatic speech recognition. Technical Report, Speech Research Unit, Defence Research Agency, Malvern, UK,1992
    [126]J. He, L. Liu, G. Palm. On the use of features from prediction residual signals in speaker identification. Proceedings of the 4th European Conference on Speech Communication and Technology.1995:313-316
    [127]俞一彪,袁冬梅,薛峰.一种适于说话人识别的非线性频率尺度变换.声学学报.2008,33(5):450-455
    [128]J. Dang, K. Honda, H.Suzuki.Morphological and acoustical analysis of the nasal and the paranasal cavities. Acoustical Society of America.1994,96(4):2088-2100
    [129]J.Dang, K. Honda. Acoustic characteristics of the human paranasal sinuses derived from transmission characteristic measurement and morphological observation. Acoustical Society of America.1996,100(5):3374-3383
    [130]J. Dang, K. Honda. Acoustic characteristics of the piriform fossa in models and humans. Acoustical Society of America.1997,101(1):456-465
    [131]H. Takemoto, S.Adachi, T. Kitamura, et al.Acoustic roles of the laryngeal cavity in vocal tract resonance. Acoustical Society of America.2006,120(4):2228-2238
    [132]HG. Longbotham, AC.Bovik. Theory of order statistic filters and their relationship to linear FIR filters. IEEE Transactions on Acoustic, Speech, and Signal Processing.1989, 37(2):275-287
    [133]JC.Segura, MC, Benitez, A. Torre, et al.Feature extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR. Proceedings of the 7th International Conference on Spoken Language Processing.2002:225-228
    [134]Y. Ephraim, D.Malah. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing.1984,32 (6):1109-1121
    [135]MK. Hasan, MR. Khan. A modified a priori SNR for speech enhancement using spectral subtraction rules. IEEE Signal Processing Letters.2004,11(4):450-453
    [136]WN.Chan, N. Zheng, T. Lee.Discrimination power of vocal source and vocal tract related features for speaker segmentation. IEEE Transactions on Audio, Speech and Language Processing.2007,15(6):1884-1892
    [137]KS.Tsai, LH. Tseng, CJ. Wu, et al.Development of a mandarin monosyllable recognition test. Ear and Hearing.2009,30(1):90-99
    [138]Chaug-Ching Huang. Audio segmentation under low SNR environment and Bayesian neural network based mandarin speech recognition. Department of Electrical Engineering,National Cheng Kung University, Tainan, Taiwan.2007
    [139]Jui-Ting Huang, Lin-shan Lee.Prosodic modeling in large vocabulary mandarin speech recognition. Proceedings of the 9~(th) International Conference on Spoken Language Processing.2006:1241-1244
    [140]D.Peng, G. Liu, J. Guo.Study on acoustic modeling in a mandarin continuous speech recognition. Journal of China University of Mining and Technology.2007,17(1): 143-146
    [141]SH. Chen. A statistical-based pitch contour model for mandarin speech. Acoustical Society of America.2005,117(2):908-925
    [142]TF. Li, SC.Chang, CB.Lee. A simple statistical speech recognition of mandarin monosyllables.Applied Mathematics and Computation.2006,177(2):644-651
    [143]吴宗济,林茂灿.实验语音学概要.北京:高等教育出版社,1989
    [144]徐世荣.普通话语音常识.语文出版社,1999
    [145]钱博.基于汉语元音映射的说话人识别技术研究.南京:南京理工大学,模式识别与智能系统.2007
    [146]曹剑芬,杨顺安.北京话复合元音的实验研究.中国语文.1984
    [147]陈肖霞.普通话元音-辅音-元音中音段协同发音研究.第三届全国语音研讨会论文集.1996
    [148]杨顺安,曹剑芬.普通话二合元音的动态特征.语言研究.1984
    [149]祖漪清.普通话三合元音的最小感知域和声学特征.应用声学.1994
    [150]钱博,李燕萍,唐振民.基于频域能量分布分析的自适应元音帧提取算法.电子学报.2007,35(2):89-92
    [151]王守觉.仿生模式识别(拓扑模式识别)—一种模式识别新模型的理论与应用.电子学报,2002,30(10):1417-1420
    [152]Hong Qin, Shoujue Wang, Hua Sun. Biomimetic pattern recognition for speaker independent speech recognition. Proceedings of the International Conference on Neural Networks and Brain.2005,2:1290-1294

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700