与文本有关的说话人识别方法的研究

作者：姜绍君
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：说话人识别 ; 特征提取 ; 模式识别
中文关键词：统计学习理论 ; 支撑向量机 ; 隐马尔可夫模型
英文关键词：speaker recognition ; feature extraction
英文关键词：pattern recognition ; statistical learning theory
英文关键词：support vector machine ; Hidden Markov Model
学位年度：2001
导师：殷福亮
学科代码：081002
学位授予单位：大连理工大学
论文提交日期：2001-03-10

摘要

说话人识别是语音处理技术的一个重要内容，它广泛应用于人机接
    口、保安、军事、司法等方面。本文研究与文本有关的说话人识别的方
    法，主要做了如下工作：
     （1）根据语音信号产生的离散时域模型，提取说话人识别的特征
    —LPC倒谱，并进行矢量量化处理。
     （2）研究统计学习理论以及在此基础上发展起来的模式识别方法—
    支撑向量机，其中包括机器学习问题的表示、经验风险最小化、推广性
    的界、结构风险最小化、最优分类面、广义最优分类面、高维空间的最
    优分类面—支撑向量机等。
     （3）研究隐马尔可夫模型识别方法，包括语音信号的隐马尔可夫模型
    物理含义、Markov链的定义、隐马尔可夫模型种类和参数估计方法以及
    隐马尔可夫模型算法实现中的问题等。
     （4）最后，在自己建立的语音库的基础上，将两种识别方法进行计算
    机仿真，并给出实验结果。
Speaker Recognition is an important subject of speech processing, It is applied to man-machine interface, ensure public security, military affairs, judicature, and so on. In this thesis, the methods of text dependent speaker recognition are studied, The main works are as follows:
    (1) According to the discrete time model of speech signal, the feature vector of speaker - LPC cepstrum is extracted, then it is quantized.
    (2) The statistic learning theory is studied firstly. Based on the theory, support vector machine are studied in details. Most of important problems of support vector machine theory are studied in this thesis. They are the express of machine study problem, empirical risk minimization, the boundary of generalization, structural risk minimization, optimal hyperplane, generalized optimal hyperplane, and optimal hyperplane - support vector machine on high dimension space etc.
    (3) The method of Hidden Markov Models is studied, including meanings of Hidden Markov Models of speech signal, definition of Markov Chain, category of Hidden Markov Models, how to estimate parameters of Hidden Markov Models and how to realize the algorithm of Hidden Markov Models.
    (4) Finally, A real speech library is built. The above mentioned methods are simulated and the results of simulation are given.

引文

[1] L. G. Kersta, "Voiceprint identification," Nature, Vol.196, pp. 1253-1257,1962
    [2] R. H. Bolt, "Speaker identification by speech spectrograms-a scientists' view of its reliability for legal purpose," J. Acoust. Soc. Am., Vol. 47, pp. 597-612, 1969
    [3] K. N. Stevens, "Speaker authentication and identification: a comparision of spectrographic and autidory presentations of speech material ," J. Acoust. Soc. Am., Vol. 44, pp. 1596-1607, 1968
    [4] B. S. Atal, "Automatic speaker recognition based on pitch contours," J. Acoust. Soc. Am., Vol. 52, pp. 1687-1697, 1972
    [5] G. R. Doddington, "Personal identity verification using voice," Proc. ELECTRO-76, pp. 22-4,1-5, May 1976
    [6] F. K. Soong, "Vector quantization approach to speaker recognition," Proc. IEEE ICASSP, pp. 387-390,1985
    [7] D. A. Reynolds, R. C. Rose, "Robust text-independent speaker identification using gaussian mixture speaker models," IEEE Trans.on Speech & Audio Processing, Vol. 3, No. 1, pp. 72-83, Jan 1995
    [8] A. E. Rosenberg, etc., "Sub-word unit talker verification using hidden Markov models," IEEE ICASSP, pp. 269-272, 1990
    [9] A. E. Rosenberg etc., "Connected word talker verification using hidden Markov models," Proc. ICASSP'91, Vol. 1, pp. 381-384, May 1991
    [10] M. Savic and S. K. Gupta, "Variable parameter speaker verification system based on hidden Markov models," Proc. IEEE ICASSP, pp. 281-284, 1990
    [11] J. Oglesby and J. S. Mason, "Radial basis function networks for speaker recognition," Proc.IEEE ICASSP, pp. 393-396,1991
    [12] Y. Bennani, "Speaker identification through a modular connectionist architecture: evaluation on the TIMIT database," Proc. ICSLP, pp. 607-610,1992
    [13] E. Monte, etc., "Text independent speaker identification on noisy environment by means of self organizing MAPs," Proc. Int. Conf. Spoken Language Processing, pp. 1804-1806, 1996
    [14] Y. Linde, A. Buzo, and R.M. Gray, "An algorithm for vector quantizer desige," IEEE Trans, on Commun, Vol. 28, No. 4, pp. 84-95,1980


    [15] V. Vapnik, The Nature of Statistical Learning Theory, Springer Verlag,New Yourk, 1995.
    [16] 边肇祺，张学工等，模式识别，第二版，清华大学出版社，北京，2000
    [17] S. Gunn, "Support vector machine for classification and regression," ISIS Technical Report, Image, Speech & Intelligent Systems Group, University of Southampton, May 1998.
    [18] M. O. Stitson, J. A. E. Weston, A.Gammerman, V. Vovk, and V. Vapnik, "Theory of support vector machines," Technical Report CSD-TR-96-17, December, 1996
    [19] B. Schlkopf, "Support vector learning," R. Oldenbourg Verlag, M nchen, Doktorarbeit, TU Berlin, 1997
    [20] B. Schlkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, "Comparing support vector machines with Gaussian kernels to radial basis function classifiers," IEEE Trans. on Signal Processing, Vol. 3, No. 1, pp. 72-83, Jan 1999
    [21] M. Minoux, Mathematical Programming: Theory and Algorithms, Johm Wiley and Sons, 1986
    [22] C. Cortes, and V. Vapnik, "Support vector networks," Machine Learning, Vol. 20, pp. 273-297, 1995
    [23] F. Girosi, "An equivalence between sparse approximation and support vector machines," Technical Report AIM-1606, Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge Massachusetts, 1996
    [24] A. Smola and B. Schlkopf, "On a kernel-based method for pattern recognition, regression, approximation and operator inversion," GMD Technical Report, No. 1064
    [25] F．黎茨，B．塞克佛尔维—纳吉等著，庄万等译，泛函分析讲义，科学出版社，1980
    [26] V. Vapnik, S. Golowich and A. Smola, "Support vector for function approximation, regression estimation, and signal processing," Neural information processing systems, Vol. 9, MIT Press, Cambridge, May 1997
    [27] 谢锦辉，隐 Markov 模型及其在语音处理中的应用，华中理工大学出版社，1995
    [28] 姚天任，数字语音处理，华中理工大学出版社，1992


    [29] S. E. Levinson, etc., "An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition," B. S. T. J. Vol. 62, No. 4, pp. 1035-1074, Apr 1983
    [30] L. R. Rabiner, etc., "Recgnition of isolated digits using hidden Markov models with continuous mixture densities," AT&T Tech J, Vol. 6, pp. 1211-1222, July-Aug 1986
    [31] S. Furui, "Recent advances in speaker recognition," Pattern Recognition Letters, Vol. 18, pp. 859-872, 1997
    [32] 杨行峻，迟惠声，语音信号数字处理，电子工业出版社，1995
    [33] 何昕，刘重庆，李介谷，“基于支撑向量机的文本无关的说话人识别系统，”计算机工程，Vol．26，No．6，pp．61～63，June，2000
    [34] M. Schmidt, H. Gish, "Speaker identification via support vector classifiers," IEEE ICASSP, pp. 105-109, 1996
    [35] 刘恩彩，与文本无关的说话人辨识系统，大连理工大学硕士学位论文，大连，1999
    [36] 刘兴利，任意文本的说话人识别系统研究，大连理工大学硕士学位论文，大连，2000
    [37] 殷福亮，宋爱军，数字信号处理C语言程序集，辽宁科学技术出版社，1997

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700