对话电话语音的话者确认研究

英文题名：Research on Speaker Verification Based on Telephone Conversation
作者：高二中
论文级别：博士
学科专业名称：电路与系统
中文关键词：对话语音 ; 说话人确认 ; 切分聚类 ; 语音质量 ; 高斯混合模型
英文关键词：Conversation Speech ; Speaker Verification ; Segmentation and Clustering ; Speech Quality ; Gaussian Mixture Model
学位年度：2011
导师：郭立 ; 李辉
学科代码：080902
学位授予单位：中国科学技术大学
论文提交日期：2011-05-01

摘要

文本无关的话者确认是目前说话人识别中的一个重要研究方向,为此,美国国家标准与技术署从1996年开始组织了说话人识别评测,用于衡量当前说话人识别技术的发展水平。他为每个参赛单位提供了统一的数据,测试平台以及评测规则等,并且设立了许多子任务,用于探索和研究不同语音条件下的研究方法。对话电话语音的说话人确认作为NIST SRE的一个子任务,具有重要的研究意义。
     本文从基本的单话者话者确认技术入手,介绍了单说话人确认的系统框架并且对其中各个部分予以详细的介绍,随后,根据对话语音话者确认与单话者确认上的异同,从两个方面对识别系统进行了深入的工作。针对对话语音说话人确认与单话者确认的不同点,对话语音话者确认需要将语音先切分聚类,该操作是将对话语音话者确认转化为常规单话者确认的关键,因此,本文详细介绍了目前常用的切分聚类方法,以及其中的不足之处,并且给出了将其合适的应用于确认的一些改进方案。
     针对对话语音话者确认与单话者确认的共同点,单话者确认系统性能的改进措施也会使得对话语音确认系统的性能相应得到改进,本文借用语音质量测量的概念,引入了新的评分准则,对系统的识别性能有较大程度的提高。
     本文提出了两种适合于后续确认过程的对初步切分聚类语音的再处理方法,
     一,提出了二次提纯的方法,由于语音的纯净度是影响识别性能的关键因素之一,对原始语音的切分聚类结果不再追求寻找准确的说话人转化点,而是追求话者语音切分的纯度,将纯净度低的语音部分予以抛弃,只保留纯净度高的部分作为确认系统的输入,从而提高确认系统的性能。
     二,采用基于融合策略的切分方法,将利用不同的方法切分的结果通过对比,找出判定结果相同的部分,利用此部分语音分别训练两个话者的模型,对判定结果不同的部分重新归类,最终得到准确的两个话者的语音。
     本文利用语音质量测量的概念,提出了GMM-UBM框架下新的基于语音信号质量的系统评分方法,通过利用辅助信息,对每一条测试语音动态的衡量其与话者模型的适配度(质量值),并利用所得到的该值调整系统的评分,使得系统的识别性能有了较大的改进。
     本文建立了一个性能稳定的对话电话语音的说话人确认系统,大量实验表明,所建立的系统具有良好的效果和鲁棒性。
Text-Independent speaker verification is one of the most important research directions of speaker recognition and in order to contribute to the direction of research efforts and the calibration of technical capabilities of text-independent speaker recognition, National Institute of Standard and Technology (NIST) has been coordinating Speaker Recognition Evaluations since 1996. In order to explore and to seek for the suitable resolution under different conditions, it sets up different tasks and it supplies the uniform speech data, evaluation criterion and so on. As one of the tasks of NIST SRE, Summed-channel conversations speaker verification is one of the meaningful directions for research.
     The verification system for one speaker is introduced much detail, and then, comparing the difference between one speaker verification and conversation speaker verification This thesis make some improves from 2 points :
     First, comparing with the one speaker verification, Segmentation and clustering is the most important operation for conversation speaker verification , so , this thesis introduces the methods of segmentation and clustering and then proposals some methods to improve the segment results , which can get much more clean segment speech and improve the verification results.
     Second, the method which effectively for one speaker verification system can improve the performance of conversation speaker verification, the conception of speech quality is introduced and then , a new score method is proposed to improve the verification performance.
     This thesis proposes two method to improve the segment results,
     First, after segmentation and clustering, the purify of each segment speech is calculated, the segments with low score are discarded while the high score segments, which mean clean speech, are saved and use these speech for verification.
     Second, proposed a segmentation method based on fusion strategy. First, segment the speech with more than two segment methods and fusion the segment results. The same segment areas can be considered as clean speech, train the speaker models with these speech segments, and then, label the different segment areas with Viterbi algorithm.
     This thesis proposes two methods to deal the segment results, which can improve the system performance. First, after segment and clustering , the purify of the result is calculate, the segment with low score segments will be discard and the high score segments, which means clean speech ,will be saved ,then we use these speech for verification
     Second we can segment the speech with more then two method and then fusion the segment results. The same areas can be consider as cleaner speech ,we train the speaker models with these speech ,and then ,we label the different area with Viterbi algorithm
     In order to improve the systems performance, the conception of speech quality is introduced and then, a new score method is proposed based on GMM-UBM system. by integrating auxiliary side information, the Log-likelihood Ratio is calculated dynamically . the experiments indicate the good performance of proposed method .
     This thesis designed a two speaker verification system, and given the performance through some different conditions, the results show the robustness and validity of the system..

引文

[1 ]李霄寒,《基于概率统计模型的说话人确认的研究》,中国科学技术大学博士学位论文,安徽合肥,2003 [ 2] L.G. Kersta,“Voiceprint identification,”Nature, vol.196, pp.1253-1257, 1962. [3 ]刘明辉,《基于GMM和SVM的文本无关的说话人确认方法研究》:中国科学技术大学博士学位论文,安徽合肥,2007 [ 4]G. R. Doddington,“Speaker recognition—Identifying people by their voices,”Proc. IEEE, vol. 73, pp. 1651–1664, Nov. 1985.
    [5] A. L. Higgins and R. E. Wohlford,“A new method of text-independent speaker recognition,”in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Tokyo, Japan, 1986, pp. 869–872.
    [6] F. K. Soong, A. E. Rosenberg, L. R. Rabiner, and B.-H. Juang,“A vector quantization approach to speaker recognition,”in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Tampa, FL, 1985, pp. 387–390.
    [7]J.R. Deller, J.H.L. Hansen, and J.G. Proakis, Discrete-Time Processing of Speech Signals. New York, U.S.A.: Macmillan Publishing Company, 1993
    [8] D.A. Reynolds and R.C. Rose,“Robust text-independent speaker identification using Gaussian mixture speaker models,”IEEE Trans. on Speech and Audio Processing, vol. 3, Jan 1995
    [9] D.A. Reynolds,“Speaker identification and verification using Gaussian mixture speaker models,”Speech Communication, vol. 17, pp: 91-108, 1995
    [10] M. Schmidt and H. Gish,“Speaker identification via support vector classifiers,”in Proc. ICASSP, vol. 1, 1996, pp. 105–108.
    [11] V. Wan and W. M. Campbell,“Support vector machines for speaker verification and identification,”in Proc. Neural Networks for Signal Processing X, 2000, pp. 775–784.
    [12]Kenny, P., G. Boulianne, et al. (2007). "Joint factor analysis versus eigenchannels in speaker recognition." IEEE Transactions on Audio Speech and Language Processing 15(4): 1435.
    [13] Kenny, P. and P. Dumouchel (2004). Experiments in speaker verification using factor analysis likelihood ratios, ISCA.
    [14] Campbell, W. M., D. E. Sturim, et al. (2006). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation.
    [15]陈雁翔,《数字串语音和对话电话语音的说话人确认的研究》,中国科学技术大学博士学位论文,安徽合肥,2004
    [16]许东星《基于GMM和高层信息特征的文本无关说话人识别研究》中国科学技术大学博士学位论文,安徽合肥,2009
    [17]官方网站介绍http://www.itl.nist.gov/iad/mig//tests/sre
    [18] W. Guo, Y. Long, Y. Li, L. Pan, E. Wang, and L. Dai,“iFLY system for the NIST 2008 speaker recognition evaluation,”in Proc. IEEE ICASSP 2009, April 2009, pp. 4209–4212
    [19] Martin, A., G. Doddington, et al. (1997). "The DET Curve in Assessment of Detection Task Performance." Fifth European Conference on Speech Communication and Technology.
    [20] Tomi Kinnunen, Haizhou Li“An Overview of Text-Independent Speaker Recognition: from Features to Supervectors”
    [21] F. Itakura, \Minimum prediction residual applied to speech recognition," IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 23,no. 1, pp. 67{72, 1975.
    [22]杨行峻、迟惠生,《语音信号数字处理》,电子工业出版社,1995年
    [23] Yuo, K. H. and H. C. Wang (1999). "Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences." Speech Communication 28(1): 13-24. [24 ] NIST Multlmodal Information Group,The 2006 speaker recognition evaluation plans.(2006) [EB/OL] ,http://www.nist.gov/speech/tests/spk/index.htm
    [25] Boersma, P. and D. Weenink (2001). "Praat, a system for doing phonetics by computer." Glot International 5(9/10): 341-345.
    [26] Sonmez, K., E. Shriberg, et al. (1998). "Modeling Dynamic Prosodic Variation for Speaker Verification." Fifth International Conference on Spoken Language Processing.
    [27]Soong, F., A.E., A. R., Juang, B.-H., and Rabiner, L. A vector quantization approach to speaker recognition. AT & T Technical Journal 66 (1987), 14–26.
    [28] Burton, D. Text-dependent speaker verification using vector quantization source coding. IEEE Trans. Acoustics, Speech, and Signal Processing 35, 2 (February 1987), 133–143.
    [29] Louradour, J., and Daoudi, K. SVMspeaker verification using a new sequence kernel. In Proc. 13th European Conf. on Signal Processing (EUSIPCO 2005) (Antalya, Turkey, September 2005).
    [30] Saastamoinen, J., Karpov, E., Hautam¨aki, V., and Fr¨anti, P.Accuracy of MFCC based speaker recognition in series 60 device. EURASIP Journal on Applied Signal Processing 17(2005), 2816–2827.
    [31] Fine, S., J. Navratil, et al. (2001). A hybrid GMM/SVM approach to speaker identification.
    [32] Campbell, W. M., J. P. Campbell, et al. (2007). "Speaker Verification Using Support Vector Machines and High-Level Features." Audio, Speech and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on] 15(7): 2085-2094.
    [33] S. E. Tranter and D. A. Reynolds,“An Overview of AutomaticSpeaker Diarisation Systems,”IEEE Trans. on Speech & AudioProc. Special Issue on Rich Transcription, p. To appear, 2006.
    [34] Hermansky, H., andMorgan, N. RASTA processing of speech. IEEE Trans. on Speech and Audio Processing 2, 4 (October 1994), 578–589..
    [35] A. Tritschler and R. Gopinath, "Improved Speaker Segmentation and Segments Clustering Using the Bayesian Information Criterion," Proceedings of Eurospeech, Budapest, Hungary, Sept., 1999
    [36] S. S. Chen and P. S. Gopalkrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion," Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Feb., 1998
    [37] H. Meinedo and J. Neto, "Audio Segmentation, Classification and Clustering in a Broadcast News Task," Proeedings of. ICASSP, Hong Kong, China, Apr., 2003.
    [38] B. P. Carlin and T. A. Louis, Bayes and Empirical Bayes Methods for Data Analysis, Boca Raton, FL, Chapman and Hall/CRC, 2000.
    [39] M. Roch and Y. Cheng,“Speaker Segmentation Using the MAP-Adapted Bayesian Information Criterion,”in Proceedings of Odyssey04, The Speaker and Language Recognition Workshop, Toledo, Spain : ISCA, pp. 349-354, May 2004.
    [40] A.K. Jain, M.N. Murty and P.J. Flynn,“Data clustering: A review”, ACM Computing Surveys, Vol. 31, No. 3, September 1999.
    [41]Reynolds, D. et al“A Study of New Approaches to Speaker Diarization”, in Proc Interspeech, 1047–1050, Brighton, UK, 2009
    [42] A. Martin and M. Przybocki,“Speaker recognition in a multi-speaker environment,”in Proc. Eur. Conf. Speech Commun. Technol., vol. 2,Aalborg, Denmark, Sep. 2001, pp. 787–790
    [43] T. Stafylakis, V. Katsouros, and G. Carayannis,“Redefining the Bayesian Information Criterion for speaker diarisation,”in Proceedings of Interspeech, September 2009.
    [44]Kenny, P. et al,“Diarization of Telephone Conversations using Factor Analysis”IEEE Journal of Selected Topics in Signal Processing, 2010.
    [45] F. Castaldo, D. Colibro, E. Dalmasso, P. Laface, and C. Vair,“Streambased speaker segmentation using speaker factors and eigenvoices,”in Proc. ICASSP, Las Vegas, Nevada, Mar. 2008, pp. 4133– 4136.
    [46]X. Zhao, Y. Dong, J. Zhao, L. Lu, J. Liu, and H. Wang,“Variational Bayesian joint factor analysis for speaker verification,”in Proc. ICASSP 2009, Taipei, Taiwan, Apr. 2009.
    [47] Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman”Online Diarization of Telephone Conversations“Odyssey Speaker and Language Recognition Workshop,Toledo july 2010,pp:125-130
    [48] M. Kotti, V. Moschou, and C. Kotropoulos,“Speaker segmentation and clustering,”Signal Process., vol. 88, no. 5, pp. 1091–1124, May 2008.
    [49]Hachem Kadri,, Manuel Davy , Noureddine Ellouze”Robust Unsupervised Speaker Segmentation for Audio Diarization”Signal Processing (2010) pp:307-320
    [50] S. E. Tranter and D. A. Reynolds,“Speaker diarization for broadcast news,”in Proc. Odyssey Speaker and Language Recognition Workshop,Toledo, Spain, Jun. 2004, pp. 337–344
    [51] E. El-Khoury, C. Senac, and J. Pinquier,“Improved speaker diarization system for meetings,”in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, April 2009, pp. 4097–4100
    [52] Simon Bozonnet, Nicholas Evans , Corinne Fredouille, Dong Wang“An Integrated Top-Down/Bottom-Up Approach To Speaker Diarization”
    [53] Gupta V., Kenny, P., Ouellet, P., Boulianne, G., and Dumouchel, P. Combining Gaussianized/non-Gaussianized features to improve speaker diarization of telephone conversations IEEE Signal Processing Letters, 14 (12), pp. 1040-1043, Dec. 2007
    [54] Gupta, V, Kenny, P, Ouellet, P, Dehak, R, Boulianne, G and P Dumouchel Multiple Feature Combination to Improve Speaker Diarization of Telephone Conversations In Proceedings of the IEEE ASRU Workshop 2007 Kyoto, Japan, Dec 2007
    [55] Gupta V, Boulianne, G, Kenny, P, Ouellet, P, and Dumouchel, P Speaker Diarization of French Broadcast News In Proc ICASSP 2008, Las Vegas, Nevada, Mar 2008
    [56] Y.Moh, P.Nguyen, and J.-C. Junqua,“Toward domain independent clustering,”in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol.II, China, Apr. 2003, pp. 85–88
    [57] P. Nguyen, L. Rigazio, Y. Moh, and J. C. Junqua. Rich transcription2002 site report. Panasonic speech technology laboratory (PSTL). pre-sented at Proc. Rich Transcription Workshop (RT-02). [Online]. Available: http://www.nist.gov/speech/tests/rt/rt2002/presentations/rt02.pdf
    [58] R. Sinha, S. E. Tranter, M. J. F. Gales, and P. C. Woodland,“The Cambridge University March 2005 speaker diarization system,”in Proc. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal, Sep. 2005, pp. 2437–2440.
    [59] D. Liu and F. Kubala,“Online speaker clustering,”in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process., vol. I, Hong Kong, China, Apr. 2003, pp. 572–575.
    [60]D. A. Reynolds and P. Torres-Carrasquillo,“The MIT Lincoln Labora-tory RT-04F diarization systems: Applications to broadcast audio andtelephone conversations,”in Proc. Fall 2004 Rich Transcription Work-shop (RT-04), Palisades, NY, Nov. 2004
    [61] F. Valente,“Variational Bayesian methods for audio indexing,”Ph.D. dissertation, Eurecom, Sep 2005.
    [62]C. Wooters, J. Fung, B. Peskin, and X. Anguera,“Toward Robustspeaker segmentation: The ICSI-SRI Fall 2004 Diarization System,”in Proc. Fall 2004 Rich Transcription Workshop (RT-04), Palisades,NY, Nov. 2004, [Online]. Available: http://www.icsi.berkeley.edu/cgi-bin/pubs/publication.pl?ID=000100
    [63] . Meignier, D. Moraru, C. Fredouille, J.-F. Bonastre, and L. Besacier,“Step-by-Step and integrated approaches in broadcast news speaker diarization,”Comput. Speech Lang., no. 20, pp. 303–330, Sep. 2005
    [64]D. Moraru, S. Meignier, L. Besacier, J.-F. Bonastre, and I.Magrin-Chagnolleau. The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation.presented at Proc. IEEE Int. Conf. Acoust., Speech, Signal Process..[Online]. Available: http://www.lia.univ-avignon.fr/fich_art/339-mor- icassp2003.pdf
    [65] S. Meignier, J.-F. Bonastre, C. Fredouille, and T. Merlin,“Evolutive HMM for multispeaker tracking system,”in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. II, Istanbul, Turkey, Jun. 2000,pp. 1201–1204.
    [66] S. Meignier, J.-F. Bonastre, and S. Igounet,“E-HMM approach for learning and adapting sound models for speaker indexing,”in Proc.Odyssey Speaker and Language Recognition Workshop, Crete, Greece,Jun. 2001, pp. 175–180.
    [67] D. Reynolds, E. Singer, B. Carlson, J. O’Leary, J. McLaughlin, and M.Zissman,“Blind clustering of speech utterances based on speaker and language characteristics,”in Proc. Int. Conf. Spoken Language Process.,vol. 7, Sydney, Australia, Dec. 1998, pp. 3193–3196.
    [68] C. Barras, X. Zhu, S. Meignier, and J.-L. Gauvain,“Improving speaker diarization,”in Proc. Fall Rich Transcription Workshop (RT-04),Palisades, NY, Nov. 2004, [Online]. Available: http://www.limsi.fr/Individu/barras/publis/rt04f_diarization.pdf
    [69] J.-L. Gauvain, L. Lamel, and G. Adda,“Partitioning and transcription of broadcast news data,”in Proc. Int. Conf. Spoken Lang. Process., vol. 4, Sydney, Australia, Dec. 1998, pp. 1335–1338.
    [70]R. Sinha, S. E. Tranter, M. J. F. Gales, and P. C. Woodland,“The Cambridge University March 2005 speaker diarization system,”in Proc. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal, Sep. 2005, pp. 2437–2440
    [71] M. A. Siegler, U. Jain, B. Raj, and R. M. Stern,“Automatic segmentation, classification and clustering of broadcast news,”in Proc. DARPA Speech Recognition Workshop, Chantilly, VA, Feb. 1997, pp. 97–99.
    [72] C. Barras, X. Zhu, S. Meignier, and J.-L. Gauvain,“Improving speaker diarization,”in Proc. Fall Rich Transcription Workshop (RT-04), Palisades, NY, Nov. 2004, [Online]. Available: http://www.limsi.fr/Individu/barras/publis/rt04f_diarization.pdf.
    [73] A. E. Rosenberg, A. Gorin, Z. Liu, and S. Parthasarathy,“Unsupervised speaker segmentation of telephone conversations,”in Proc. Int. Conf. Spoken Language Process., Denver, CO, Sep. 2002, pp. 565–568.
    [74] Hassan Ezzaidi, Jean Rouat and Douglas O’Shaughnessy,“Combining pitch and MFCC for speaker recognition systems”Eurospeech, Denmark, pp. 2825-2828, 2001
    [75]Schwarz, G.: 1971, A sequential student test, The Annals of Statistics 42(3), 1003–1009.
    [76]Schwarz, G.: 1978, Estimating the dimension of a model, The Annals of Statistics 6, 461–464.
    [77] Tritschler, Alain, & Gopinath, Ramesh A. 1999. Improved Speaker Segmentation andSegments Clustering Using the Bayesian Information Criterion. In: Eurospeech’99 -6th European Conference on Speech Communication and Technology.
    [78] Lopez, Javier Ferreiros, & Ellis, Daniel P. W. 2000. Using Acoustic Condition Clustering To Improve Acoustic Change Detection On Broadcast News. In: ICSLP 2000 - 6th International Conference on Spoken Language Processing.
    [79] Vandecatseye, An, & Martens, Jean-Pierre. 2003. A Fast, Accurate and Stream-Based Speaker Segmentation and Clustering Algorithm. In: Interspeech’2003 - 8th European Conference on Speech Communication and Technology.
    [80]高前勇《连续音频流分类方法研究》中国科学技术大学硕士学位论文,2007
    [81] Siegler, M. A., Jain, U., Raj, B. and Stern, R. M.: 1997, Automatic segmentation, classification and clustering of broadcast news audio, DARPA Speech Recognition Workshop, Chantilly, pp. 97–99
    [82] Hung, J., Wang, H. and Lee, L.: 2000, Automatic metric based speech segmentation for broadcast news via principal component analysis, Proc. International Conference on Speech and Language Processing, Beijing, China.
    [83] Campbell, J. P.: 1997, Speaker recognition: a tutorial, Proceedings of the IEEE 1.85(9), 1437–1462.
    [84]M. A. Siegler, U. Jain, B. Raj, and R. M. Stern,“Automatic segmentation, classification and clustering of broadcast news audio,”in DARPA Speech Recognition Workshop, Chantilly, 1997, pp. 97–99.
    [85] J. Hung, H. Wang, and L. Lee,“Automatic metric based speech segmentation for broadcast news via principal component analysis,”in ICSLP’00, Beijing, China, 2004.
    [86]Willsky, A. S. and Jones, H. L.: 1976, A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems, IEEE Transactions on Automatic Control AC-21(1), 108–112.
    [87] Appel, U. and Brandt, A.: 1982, Adaptive sequential segmentation of piecewise stationary time series, Inf. Sci. 29(1), 27–56
    [88] V-B. Le, O. Mella, and D. Fohr,“Speaker diarization using normalized cross-likelihood ratio,”in Interspeech 2007, 2007.
    [89] T. Hain, S. Johnson, A. Turek, P. Woodland, and S. J. Young,“Segment generation and clustering in the htk broadcast news transcription system,”in DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 133–137.
    [90] J. F. Lopez and D. P. W. Ellis,“Using acoustic condition clustering to improve acoustic change detection on broadcast news,”in ICSLP-00,Beijing, China, 2000.
    [91] S. S. Chen, M. J. F. Gales, R. A. Gopinath, D. Kanvesky, and P. Olsen,“Automatic transcription of broadcast news,”Speech Communication,vol. 37, pp. 69–87, 2002.
    [92] S. Shaobing Chen and P. Gopalakrishnan,“Speaker, environment and channel change detection and clustering via the bayesian information criterion,”in Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA, Feb. 1998
    [93]S. S. Chen and P. Gopalakrishnan,“Clustering via the bayesian information criterion with applications in speech recognition,”in ICASSP’98, vol. 2, Seattle, USA, 1998, pp. 645–648.
    [94] A. Tritschler and R. Gopinath,“Improved speaker segmentation and segments clustering using the bayesian information criterion,”in Eurospeech’99, 1999, pp. 679–682. [95 ] Altincay, H., and Demirekler, M. Speaker identification by combining multiple classifiers using dempster-shafer theory of evidence. Speech Communication 41, 4 (November 2003),531–547
    [96]Chen, K., Wang, L., and Chi, H. Methods of combining multiple classifiers with dierent features and their applications to text-independent speaker recognition. International Journal of Pattern Recognition and Artificial Intelligence 11, 3 (1997),417–445.
    [97]Damper, R., and Higgins, J. Improving speaker identification in noise by subband processing and decision fusion. Pattern Recognition Letters 24 (September 2003), 2167–2173.
    [98] Farrell, K., Ramachandran, R., and Mammone, R. An analysis of data fusion methods for speaker verification. In Proc. Int.Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1998) (Seattle, Washington, USA, 1998), vol. 2, pp. 1129–1132.
    [99]Fredouille, C., Bonastre, J.-F., and Merlin, T. AMIRAL:A block-segmental multirecognizer architecture for automatic speaker recognition. Digital Signal Processing 10, 1-3 (January 2000), 172–197
    [100] Mak, M.-W., Cheung, M., and Kung, S. Robust speaker verification from GSM-transcoded speech based on decision fusion and feature transformation. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003) (Hong Kong, China, April 2003), vol. 2, pp. 745–748.
    [101]Moonasar, V., and Venayagamoorthy, G. A committee of neural networks for automatic speaker recognition (ASR) systems.In Proc. Int. Joint Conference on Neural Networks (IJCNN 2001) (Washington, DC, USA, July 2001), pp. 2936–2940.
    [102]Ramachandran, R., Farrell, K., Ramachandran, R., andMammone, R. Speaker recognition - general classifier approaches and data fusion methods. Pattern Recognition 35 (December 2002), 2801–2821.
    [103]Slomka, S., Sridharan, S., and Chandran, V. A comparison of fusion techniques in mel-cepstral based speaker idenficication. In Proc. Int. Conf. on Spoken Language Processing (ICSLP 1998) (Sydney, Australia, November 1998), pp. 225–228.
    [104]Garcia-Romero, D., Fierrez-Aguilar, J. et al.: On the use of quality measures for text-independent speaker recognition In Proc. Speaker Odyssey : the Speaker Recogniton Workshop(Odyssey 2004) (Toledo,Spain,May 2004),vol.4, pp 105-110.
    [105] Rong Zheng, Hongchen Jiang, Shuwu Zhang, et al.: Exploiting GMM-based Quality Measure for SVM Speaker Verification, International Symposium on Chinese Spoken Language Processing ,2006 december, pp :13-16 .
    [106] Daniel Garcia-Romero, Julian Fierrez-Aguilar, Joaquin Gonzalez-Rodriguez, et al,“Using quality measures for multilevel speaker recognition”, Computer Speech and Language, Apr. 2006,Vol. 20, Issue 2-3, pp. 192-209.
    [107] M. K. Sonmez, L. Heck, M. Weintraub, and E. Shriber“A lognormal tied mixture model of pitch for prosodybased speaker recognition,”in Proc. of EuroSpeech, 1997.,vo 3, pp. 1391–1394.
    [108] P. Boersma and D. Weenink, "Praat, a system for doing phonetics by computer"[J], Glot International, vol. 5, pp. 341-345, 2001.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700