基于语音信号时变特性的说话人识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
说话人识别是一类特殊的语音识别。近年来,这一技术迅速发展,与文本有关的说话人确认系统在一些需要进行身份核查的场所得到了应用。但仍然有一些问题需要解决,其中关键的问题是,究竟用语音信号的哪些特征描写说话人才是有效而可靠的。
     说话人识别包括说话人确认和说话人辨认,本文主要研究的是与文本有关的说话人辨认问题。基于语音信号的时变特性,在平均MEL倒谱基础上提取随时间变化的特征频率(包括时变的基音频率),由此得到了由各个语音信号特征频率倒谱值序列构成的时间序列。运用时间序列预处理和数理统计的方法,分离时间序列的趋势波动量和随机波动量。随机波动量是零均值自协方差非平稳的时间序列,利用满阶时变参数自回归(Time-Varying Parameter Autoregressive)模型对随机波动量序列进行分析,进一步提取说话人语音信号的特征参数。在随机波动量序列和用满阶TVPAR模型分析的基础上分别进行说话人识别研究。
     本文选择最小BIC(Bayesian Information Criterion)法则分析确定回归模型阶次,最后采用马氏距离对说话人进行判别。实验表明,用满阶TVPAR模型进行识别,识别率比随机波动量序列上的识别率有较大提高。在满阶TVPAR模型基础上,取一个特征频率时识别率达到97.3%,两个特征频率识别率达到98.6%。
The speaker recognition is a special kind of speech recognition. In recent years, with the rapid development of technology, the text-dependent speaker verification system has been used in some areas where need identity authentication. But there are still some problems to be solved. One of them is how to reliably describe the speech characteristics for speaker recognition more efficiently.
     There are speaker verification and speaker identification in speaker recognition. This paper focuses on text-dependent speaker identification. On the base of time-varying characteristics of speech signal, time-varying characteristic frequency (pitch frequency included) is extracted from the average MEL cepstrum, and the cepstrum value series of characteristic frequency are gained on. The deterministic and stochastic fluctuations of the time series are separated by use of time series pretreatment and statistical methods. As zero mean autocovariance nonstationary time series, the stochastic fluctuations are analyzed by the full order TVPAR (Time-Varying Parameter Autoregressive) model, and the characteristic parameters are extracted from speech signals of the speaker. The speech signals are recognized on the stochastic fluctuations of the time series and analysis with the full order TVPAR model.
     In this paper, the order of regression model is selected by using the minimum BIC (Bayesian Information Criterion) rule, speakers are discriminated by using Mahalanobis distance. The experimental results manifest that the recognition rate obtained by the full order TVPAR model are higher than only on stochastic fluctuations of the time series, with one and two characteristic frequencies, the average recognition rate reaches 98.6% and 100% respectively.
引文
[1]赵力编著.语音信号处理[M].北京:机械工业出版社, 2003:31-75, 236-242.
    [2] Reynolds D A. An overview of automatic speaker recognition technology [C]. Proceedings of ICASSP. Orlando: IEEE, 2002, 4:4 072-4 075.
    [3]胡航编著.语音信号处理[M].哈尔滨:哈尔滨工业大学出版社, 2000:18-26, 178-185.
    [4] Kersta L G. Voiceprint identification [J]. Nature, 1962, 196:1253-1257.
    [5] Luck J E. Automatic speaker verification using cepstral measurements[J]. Acoust. Soc.Am, 1969, 46(4):1026-1032.
    [6] H Sakoe, S Chiba. Dynamic programming algorithm optimization for spoken word Recognition[J]. IEEE Trans. On Acoustics, Speech and Signal Processing, 1978, 26:43-49.
    [7] L R Rabiner, C K Pan, F K Soong. On the Performance of Isolated word Speech Recognizers using Vector Quantization and Temporal Energy Contours[J]. AT&T Tech, 1984, 63(3):1245-1260.
    [8] B S Atal, S L Hanauer. Speech analysis and synthesis by linear prediction of the speech wave. Acoust[J]. Soc. Am, 1971, 50(2):637-655.
    [9] F Itakura. Minimum prediction residual applied to speech recognition[J]. IEEE Trans. On Acoustics, Speech and Signal Processing, 1975, 23: 67-72.
    [10] B S Atal. Automatic Recognition of speakers from their voices[J]. Proceedings of IEEE, 1976, 64(4):460-475.
    [11] S B Davis, P Mermelstein. Comparision of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1980, 28(4):357-366.
    [12] L R Rabiner. A tutorial on hidden markov models and selected applications in speechRecognition[J]. Proc. IEEE, 1989, 77(2):257-286.
    [13] L R Rabiner, B H Juang. Fundamentals of Speech Recognition[M]. Prentice Hall Press, 1993.
    [14] J Oglesby, J S Mason. Optimization of neural models for speaker identification[C]. In Proc. ICASSP, 1990:261-264.
    [15] D A Reynolds, R C Rose. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Trans. on Speech and Audio Processing, 1995, 3(1):72-83.
    [16] D A Reynolds. Speaker Identification and Verification Using Gaussian Mixture Speaker Models[J]. Speech Communication, 1995, 17:91-108.
    [17] D Xin, Z H Wu. Speaker recognition Using Continuous Density Support Vector Machines[J]. IEEE Electronics Letters, 2001, 37(17):1009-1011.
    [18]汪峥,连翰,王建军.说话人识别中特征参数提取的一种新方法[J].复旦学报, 2005, 44(1):197-200.
    [19]武妍,金明曦,王洪波.基于KL-小波包分析的文本无关的说话人识别[J].计算机工程与应用, 2005, 4:26-28.
    [20]檀蕊莲,刘建平,李哲.说话人识别技术及其应用[J].信息技术, 2007, 12:23-25.
    [21]费万春.描述茧丝纤度序列的理论模型[J].丝绸, 2007, 2:19-21.
    [22] FEI Wanchun, BAI Lun. Time-varying parameter autoregressive models for autocovariance nonstationary time series[J]. Science in China Series A: Mathematics, 2009, 39(1): 71-78.
    [23]徐良军,费万春,张伟杰,鲁星星.基于语音信号时变特性的说话人辨认[J].数字技术与应用,2010, 1:57-61.
    [24]张伟杰,费万春,徐良军等.一种说话人识别的新方法[J].计算机应用, 2009, 29(3):754-757.
    [25]王振龙.时间序列分析[M].北京:中国统计出版社, 2000:4-5.
    [26]李雪松编著.高级经济计量学[M].北京:中国社会科学出版社, 2008:135-137, 120-122, 161-164.
    [27]吴怀宇编著.时间序列分析与综合[M].武汉:武汉大学出版社, 2004:9-21, 89-96.
    [28]王沁编著.时间序列分析及应用[M].成都:西南交通大学出版社, 2008:7-10.
    [29]费万春,白伦.茧丝纤度序列趋势分量的解析和仿真研究[J].丝绸, 2004,(8): 22-25.
    [30] FEI Wanchun, BAI Lun. Auto-Regressive Models of Non-Stationary Time Series with Finite Length[J]. Tsinghua Science and Technology, 2005, 10(2):162-168.
    [31] FEI Wanchun, BAI Lun. Pattern Recognition of Non-Stationary Time Series with Finite Length[J]. Tsinghua Science and Technology, 2006, 11(5):611-616.
    [32]梅长林,周家良编著.实用统计方法[M].北京:科学出版社, 2002: 85-110.
    [33]张善文,雷英杰,冯有前编. Matlab在时间序列分析中的应用[M].西安:西安电子科技大学出版社, 2007: 10-13.
    [34]郝红伟编著. MATLAB 6实例教程[M].北京:中国电力出版社, 2001:151-159.
    [35]何强,何英编著. MATLAB扩展编程[M].北京:清华大学出版社, 2002:289-300.
    [36]今井聖著.音声信号处理[M].日本:森北出版株式会社, 1996:185-191.
    [37] FEI Wanchun, BAI Lun, ZHANG Weijie. Regression Analysis for Segment Size and Mean Size of Cocoon Filament[C]. in the 6th China International Silk Conference and the 2nd International Textile Forum, (SuZhou), 2007, 1:67-70.
    [38]盛骤,谢式千,潘承毅.概率论与数理统计[M].北京:高等教育出版社, 2001:233.
    [39] Wanchun Fei, Liangjun Xu, Xingxing Lu. Speaker Recognition on Nonstationary Characteristics[C]. the 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD'10), (ShanDong) , 2010, 8.
    [40]张德培,罗蕴玲编.应用概率统计[M].北京:高等教育出版社, 2000: 101-106.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700