说话人识别中信息融合算法的研究

英文题名：A Research on Information Fusion Algorithms in Voice Biometrics
作者：刘镝
论文级别：博士
学科专业名称：信息安全
中文关键词：生物识别 ; 信息融合 ; 说话人识别 ; 特征级融合 ; 匹配分数级融合 ; 决策级融合
英文关键词：Biometrics ; Information fusion ; Speaker recognition ; Feature level fusion ; Matching-score level fusion ; Decision-making level fusion
学位年度：2011
导师：裘正定 ; 孙冬梅
学科代码：081002
学位授予单位：北京交通大学
论文提交日期：2011-06-01

摘要

摘要：本文通过对说话人认证系统中的特征级融合、匹配分数级融合、决策级融合与多层级融合各种信息融合算法的研究,进一步提高了系统的认证精度,以便更好地解决关系国计民生的公共信息安全问题。针对各个融合层级发现的问题,本文分别从建立特征级融合理论、提出匹配分数级融合特征选择算法、提出多层级融合理论三个角度进行了研究。
     本文主要贡献有：
     1.本文按照信息融合生物识别系统中存在的特征级、匹配分数级与决策级三种融合层级,对现有信息融合说话人识别算法进行归纳,并对匹配分数融合说话人识别算法进行子类划分。然后,针对各融合层级遇到的问题与不足,具体工作有：
     2.针对特征层级存在的问题,本文提出了一种基于关系度量融合框架的特征级融合算法。以此关系度量框架为理论依据建立了一种基于特征级融合的说话人认证算法,通过引入最大Kullback-Leibler距离计算特征级融合的有效信息量,首次从信息论角度阐述了特征级融合优于说话人识别中常用的匹配分数融合的原因。实验结果显示特征级融合算法较传统匹配分数算法可以获取更多的有效信息量,得到了比匹配分数融合和单模态算法更优的性能。最佳情况下,特征级融合算法的等错误率比传统匹配分数融合降低了3.88%,比单模态算法降低了7.3%。
     3.针对匹配分数层级存在的问题,本文提出了一种基于Spearman相关系数的匹配分数融合特征选择算法。在匹配分数融合过程中,如何选择两种相关性较小的匹配分数是提高融合后系统性能的关键。目前业内缺乏衡量这种相关性的度量。本文首次引入Spearman相关系数来衡量匹配分数之间相关性,并且利用多项式曲线拟合Spearman系数分别与等错误率、MinDCF之间的关系,验证了该系数的有效性。进一步引入Kullback-Leibler距离分析了与Spearman系数之间的关系,再次验证了Spearman系数的有效性。通过Spearman系数对6种话语特征共15种两特征融合方案的匹配分数相关性的实验评估,进行了最优融合特征的选择,选出了MFCC与residual phase的最佳融合方案,并将Spearman系数与其它典型相关性度量进行时效性比较,验证其时效性最优,适合大量话语特征的快速选择。
     4.针对决策级融合存在的问题,本文首次提出了一种多层级融合说话人识别框架理论,在框架中分别定义了一种强多层级融合、三种弱多层级融合的四种多层级融合概念。针对两特征融合实例,分别讨论了以上四种多层级融合情况,提出了一种两特征的匹配分数、决策级融合多层级融合算法,验证了多层级融合理论的可行性。实验结果显示该算法性能均优于传统匹配分数算法、单模态算法。最优情况下,比传统单模系统等错误率降低了18.63%。
ABSTRACT:This thesis aims to improve performance of voice biometrcis system through different investigation of feature level fusion, matching-score level fusion, decision-making level fusion and multiple level fusion algorithms, in order to solve the problems related to public security furtherly. By the discussions of three different level fusion frameworks, the thesis strengthens the accuaccay of the system by aspects of the establishment of feature level fusion, feature selection for matching-score level fusion, and multiple level fusion. The main contributions are shown as follows:
     1. According to three fusion levels, firstly it summrises current information fusion algorithms on speaker recognition, then makes subcatrgories for the matching-score level fusion. By investigating problems encounted in each fusion level, the following contributions have been proposed:
     2. For the feature level, a Relation Measurement Fusion framework-based feature level fusion algorithm on speaker verification has been proposed which superior to the existing fusion methods. According to the robustness and availability of the Relation Measurement Fusion framework, the feature level fusion on speaker verification is established. In order to show advantage of feature level fusion, the Maximum Kullback-Leibler distance is firstly introduced to measure information content for feature level and matching-score level fusions. The exprimental results indicate the feature level fusion can hold more discriminative information amount to obtain lower EER and MinDCF than the existing matching-score level fusion and unimodal algorithms. In the best case, compared to the matching score level fusion and unimodal algorithm, EER of the proposed algorithm improves 3.88% and 7.3%.
     3. For the matching score level, a Spearman rank correlation coefficient-based feature selection algorithm for the matching-score level fusion has been proposed. Fusion techniques by using different features have been employed, but no metric is used to measure correlation for combined features on the matching-score level fusion so far. So an attempt by making use of the Spearman rank correlation coefficient is described as a metric to measure correlation for the matching-score level fusion of speaker recognition. In this context, this metric is able to find out an optimized selection the combination of MFCC and residual phase to achieve good performance. Then, polynomial curve fitting is employed to describe the relationships between the Spearman coefficient and EER or MinDCF, tesifying the availability of the Spearman coefficient. After that, Kullback-Leibler distance is used to verifie that the availability of Spearman coefficient again. Finally, compared with other correlation metrics, the time cost of the Spearman coefficients outperforms others.
     4. For decision-making level, a multiple level fusion framework has been proposed. Based on this framework, both a strong multiple level fusion and three weak multiple level fusion have been defined. By discussing these four multiple level fusion cases, finally a two-feature muitiple level fusion algorithm which combines matching-score level fusion and decision-making level fusion has been proposed. From the experimental results, this algorithm has shown the theory of the multiple level fusion has the avaibility, and is superior to the current maching-score level fusion and unimodal algorithm, reducing 18.63% of EER compared with unimodal algorithm in the best case.

引文

[Adachi08]Y. Adachi, S. Kawamoto, S. Morishima, and S. Nakamura, "Perceptual similarity measurement of speech by combination of acoustic features", in Proc.of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2008:4861-4864.
    [Adler09]A. Adler, R. Youmaran, and S. Loyka, "Towards a measure of biometric feature information", Pattern Analysis and Applications,2009,12(3):261-270.
    [Amino06]K. Amino, T. Sugawara, and T. Arai,"Speaker similarities in human perception and their spectral properties", in Proc.of the International Conference of WESPAC IX,2006:1-6.
    [Andrews02] W. Andrews, M. Kohler, J. Campbell, J. Godfrey, and J.Hernandez-Cordero, "Gender-dependent phonetic refraction for speaker recognition", in Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2002:149-152.
    [Auckenthler00] R. Auckenthler, M. Carey, and H. Lloyd-Thomas, "Score normalization for text-independent speaker verification systems", Digital signal processing,2000,10(1):42-54.
    [Bach04]F. R. Bach, G. R. G.Lanckriet, and M. I. Jordan, "Multiple kernel learning, conic duality, and the SMO algorithm",in Proc. of the 21st International Conference on Machine Learning, 2004:41-48.
    [Badrinatch08] G. S. Badrinatch, and P. Gupta, "Palmprint verification using SIFT features", in Proc. of the International conference on Image Processing Theory, Tools and applications,2008:1-8.
    [Baker75]L. Baker, "The dragon system and overview",IEEE Transaction on Acoustics, Speech, and Signal Processing,1975,23(1):24-29.
    [Ben08]B. Ben, "Bush pushes biometrics for national security".Federal Computer Week (Media, Inc.),6 June 2008.
    [Ben-Hur05] A. Ben-Hur, and W.S. Noble, "Kernel methods for predicting protein-protein interactions", Bioinformatics,2005,21(1):38-46.
    [Benesty08] J. Benesty, M. M. Sondhi, and Y. Huang, "Springer handbook of speech processing", Springer-Verlag Berlin Heidelberg, ISBN 9783540491255,2008:768.
    [Bennett02] K. P. Bennett, M. Momma,and M.J. Embrechts, "MARK:a boosting algorithm for heterogeneous kernel models", in Proc. of the 8th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton,2002:24-31.
    [Bengio07]S. Bengio, and J. Mariethoz. "Learning the decision function for speaker verification", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2007:425-428.
    [Bigun97]E. Bigun, J. Bigun, B. Duc, and S. Fischer, "Expert Conciliation for multimodal person authentication systems using Bayesian statistics", in 1st International conference on Audio-and Video-Based Biometric Person Authentication,1997:291-300.
    [Bimbot04]F. Bimbot, J. F. Bonastre, C. Fredouille, et al.,"A tutorial on text-independent speaker verification", EURASIP Journal on Advances in Signal Process,2004(4):430-451.
    [Boser92]B. E. Boser, I.M. Guyon, and V.N. Vapnik. "A training algorithm for optimal margin classifiers", in Proc.of the 5th Annual Workshop on Computational Learning Theory, 1992:144-152.
    [Brillinger01] D. Brillinger, "Time series:data analysis and theory", Society for Industrial and Applied Mathematics,2001:57.
    [Brummer05] N. Brummer, Tools for fusion and calibration of automatic speaker detection systems, http://www.dsp.sun.ac.za/nbrummer/focal/2005.
    [Brummer06] N. Brummer, and J. Preez, "Application-independent evaluation of speaker detection", Computer Speech and Language 2006,20(2):230-275.
    [Brummer07] Brummer, N., Burget, L., Cernocky, et al.,"Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006", IEEE Transaction on Audio, Speech, and Language Processing 2007,15(7):2072-2084.
    [Brunelli95] R.Brunelli and D. Falavigna, "Person identification using multiple cues", IEEE Transaction on Pattern Analysis and Machine Intelligence,1995,17(10):955-966.
    [CaiZ07]蔡择林,李开灿,常见分布的最大Kullback-Leibler距,武汉大学学报(理学版),2007(5)：513-517.
    [Campbe1197] W. M. Campbell, "Speaker recognition:A tutorial", The IEEE Proceeding,1997, 85(9):1437-1462.
    [Campbe1104] W. Campbell, J. Campbell, D. Reynolds, D. Jones, and T. Leek, "Phonetic speaker recognition with support vector machines", In Advances in Neural Information Processing Systems, MIT Press,2004:1377-1384.
    [Campbe1106a] W. Campbell, J. Campbell, D. Reynolds, et al., Support vector machines for speaker and language recognition, Computer Speech and Language 2006,20(2-3):210-229.
    [Campbe1106b] W. Campbell, D.Strurim, D. Reynolds, and A. Solomonoff, "SVM-based speaker verification using a GMM supervector kernel and NAP variability compensation," in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2006: 97-100.
    [Campbe1109] J. P. Campbell, W. Shen, W. M. Campbell, et al., "Forensic speaker recognition", IEEE Signal Processing Magazine,2009,26(2):95-103.
    [Chan07]W. Chan, N. Zheng, and T. Lee, "Discrimination power of vocal source and vocal tract related features for speaker segmentation", IEEE Transaction on Audio, Speech, and Language Processing,2007,15(6):1884-1892.
    [Charbuillet07] C. Charbuillet, B. Gas, M. Chetouani, and J. L. Zarader, "Complementary features for speaker verification based on genetic algorithms", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2007:285-288.
    [Chakroborty09] S. Chakroborty, and G. Saha, "Improved text-independent speaker identification using fused MFCC and IMFCC feature sets based on Gaussian filter". International journal of signal processing,2009,5(1):11-19.
    [Chao09]Y. H. Chao, W. H. Tsai, and H. M. Wang, "Improving GMM-UBM speaker verification using discriminative feedback adaptation", Computer speech and languages,2009,23(3): 376-388.
    [Chatzis99] V. Chatzis, A. G Bors, and I. Pitas, "Multimodal decision-level fusion for person authentication", IEEE Transaction on Systems Man Cybernet. Part A:Systems Humans,1999, 29(6):674-681.
    [Cho08]S. Y. Cho, "Probabilistic Based Recursive Model for Adaptive Processing of Data Structure", Expert Systems with Applications,2008,32(2):1403-1422.
    [Christopher98] J. Christopher, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery,1998:121-167.
    [Clarke09]B. Clarke, E. Fokou, and H. H. Zhang, "Principles and Theory for Data Mining and Machine Learning", Springer Dordrecht Heidelberg London New York, ISBN 9780387981345,2009:61.
    [Cordeiro06] H. Cordeiro, and C. Ribeiro, "Speaker characterization with mlsf", in Proc.of the IEEE Odyssey 2006:The Speaker and Language Recognition Workshop,2006:1-4.
    [Cristianini02] N. Cristianini, J. Shawe-Taylor, A. Elissee, J.Kandola, "On kernel-target alignment" Advances in Neural Information Processing Systems. MIT Press,2002:367-373.
    [DaiW07]W.Z. Dai and K. T. Wang, "An Image Edge Detection Algorithm Based on Local Entropy", in Proc.of the IEEE International Conference on Integration Technology, 2007:418-420.
    [Daugman04] J. Daugman, "Combining Multiple Biometrics", Available at http://www.cl.cam.ac.uk/users/jgd1000/combine/combine.html
    [Dehak07]N. Dehak, P. Kenny, and P. Dumouchel, "Modeling prosodic features with joint factor analysis for speaker verification", IEEE Transaction on Audio, Speech, and Language Processing,2007,15(7):2095-2103.
    [Dehak08]R. Dehak, N. Dehak, P. Kenny and P. Dumouchel, "Kernel combination for SVM speaker verification",in Proc.of the Odyssey 2008:Speaker and Language Recognition Workshop 2008.
    [Dehak09]N. Dehak, P. Kenny, R. Dehak, O. Glembek, P. Dumouchel and L.Burger, "Support vector machines and joint factor analysis for speaker verification", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2009:4237-4240.
    [Demiroglu07] C. Demiroglu, and D.V. Anderson, and M. A. Clements,"A Miss Data-based Feature Fusion Strategy for Noise-Robust Automatic Speech Recognition Using Noisy Sensors", in Proc.of the IEEE International Conference on Circuits and Systems,2007:965-968.
    [DengH04]H. Deng, L. Du, and H. Wan,"Combination of likelihood scores using linear and SVM approaches for text-independent speaker verification", in Proc.of the International Conference on Signal Processing 04,2004:2261-2264.
    [Doddington01] G. Doddington, "Speaker recognition based on idiolectal differences between speakers", in Proc.of the 7th European Conference on Speech Communication and Technology, 2001:2521-2524.
    [Douglas00] A. Douglas, F. Thomas, and B. Robert, "Speaker verification using adapted Gaussian mixture models", Digital Signal Processing,2000:19-41.
    [Duc97]B. Duc, E. Bigun, et al., "Fusion of audio and video information for multimodal person authentication", Pattern recognition letters,1997,18(9):835-843.
    [Farre1198] K. Farrell, R. Ramachandran, and R. Mammone, "An analysis of data fusion methods for speaker verification", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),1998(2):1129-1132.
    [Ferrer08a] L. Ferrer, M. Graciarena, A. Zymnis, and E. Shriberg, "System combination using auxiliary information for speaker verification", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2008:4853-4856.
    [Ferrer08b] L. Ferrer, K. Sonmez, and E. Shriberg, "An anticorrelation kernel for improved system combination in speaker verification", in Proc.of the Speaker and Language Recognition Workshop,2008:22.
    [Furui80]S. Furui, "Digital Speech Processing, Synthesis, and Recognition", IEEE Transactions on Acoustics, Speech and Signal Processing,1980,28(3):342-350.
    [Furui81]S. Furui, "Cepstral analysis technique for automatic speaker verification", IEEE Transactions on Acoustics, Speech and Signal Processing,1981,29(4):254-272.
    [Forney73]G. Forney, "The viterbi algorithm", Proceedings of the IEEE,1997:268-278.
    [Garcia03]D. Garcia-Romero, J. Fierrez-Aguilar, J. Gonzalez-Rodriguez, and J. Ortega-Garcia, "Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2003(2):229-232.
    [Garcia04]D. Garcia-Romero, J. Fierrez-Aguilar, J. Gonzalez-Rodriguez, and J. Ortega-Garcia, "On the use of quality measures for text-independent speaker recognition", in Proc.of the Speaker Recognition Workshop (Odyssey),2004(4):105-110.
    [Gov06]国家中长期科学和技术发展规划纲要(2006-2020),http://www.gov.cn.
    [Gish94]H. Gish, and M. Schmidt, "Text-independent speaker identification", IEEE Signal Processing Magazine,1994:18-32.
    [Gustavo06] C. V. Gustavo, G. C. Luis, M. M. Jordi, V. F. Joan, C. M Javier, "Composite kernels for hyperspectral image classification", IEEE Transactions on Geoscience and Remote Sensing Letters,2006,3(1):93-97.
    [Gutschaven00] B. Gutschaven, P. Verlinde, "Multi-modal identity verification using Support Vector Machines (SVM)", in Proc.of the 3rd International Conference on Information Fusion, 2000(2):THB3/3-THB3/8.
    [Hatch05]A. Hatch, A.Stolcke, and B.Peskin, "Combining feature sets with support vector machines: Application to speaker recognition", In The 2005 IEEE Workshop on Automatic Speech Recognition and Understanding,2005:75-79.
    [Higgins01] J. E. Higgins, T. J. Dodd, and R. I. Damper, "Information fusion for subband-HMM speaker recognition", in Proc.of the IEEE International Joint Conference on Neural Networks, IJCNN'01, Washington DC.2001:1504-1509.
    [Holmes03]J. Holmes and W. Holmes, "Speech synthesis and recognition", Taylor& Fransic Group publisher,2003.
    [Hong98]L. Hong and A. K. Jain, "Integrating faces and fingerprints for personal identification", IEEE Transaction on Pattern Analysis and Machine Intelligence,1998,20(12):1295-1307.
    [Hong99]L. Hong, A. K. Jain and S. Pankanti, "Can Multibiometric Improve Performance?" AutoID'99, Summit (NJ),1999:59-64.
    [Horn85]R. A. Horn and C. R. Johnson, "Matrix analysis", Cambridge University Press, ISBN 0521305861,1985:29.
    [HuangR06]R. Huang, and J. Hansen, "Advances in unsupervised audio classification and segmentation for the broadcast news and ngsw corpora", IEEE Transactions on Audio, Speech, and Language Processing,2006,14(3):907-919.
    [HuangX01]X. Huang, A. Acero, and H.W. Hon, "Spoken Language Processing:a Guide to Theory, Algorithm, and System Development", Prentice-Hall, New Jersey,2001.
    [HuR05]R. Hu, and R. I. Damper,"Fusion of two classifiers for speaker identification:removing and not removing silence", in Proc.of the 7th international conference on information fusion, 2005:429-436.
    [Iyengar95] S. S. Iyengar, L. Parsad, and H. Min, "Advances in Distributed Sensor Technology", Prentice-Hall, Englewood Cliffs, NJ,1995.
    [Islam00]T. Islam and P. Kabal, "Partial-energy weighted interpolation of linear prediction coefficients", in Proc.of the IEEE Workshop on Speech Coding,2000:105-107.
    [Islam07]T. Islam, S. Mangayyagari, and R. Sankar, "Enhanced speaker recognition based on score level fusion of AHS and HMM",in Proc.of the IEEE SoutheastCon,2007,14-19.
    [Jain00]A. Jain, R. Duin, and J. Mao, "Statistical pattern recognition:A review", IEEE Transaction on Pattern Analysis and Machine Intelligence 22(1):4-37.
    [Jain04]A.K. Jain and A. Ross, "Multibiometric Systems", Communication of the ACM,2004, 47(1):34-40.
    [Jobson91]J. D. Jobson, "Applied multivariate data analysis", volume Ⅰ:Regression and experimental design, Springer-Verlag New York Inc, ISBN 0387976604,1991:116.
    [Joseph09]J. P. Campbell, W. Shen, W. M. Campbell, et al., "Forensic speaker recognition", IEEE Signal Processing Magazine,2009,26(2):95-103.
    [Kajarekar05] S. S. Kajarekar, "Four weightings and a fusion:a cepstral-SVM system for speaker recognition", in Proc.of the IEEE Workshop on Automatic Speech Recognition and Understanding,2005:17-22.
    [Kei06]S. Au-Yeung and M.H. Siu, "Maximum likelihood linear regression adaptation for the Polynomial segment models", IEEE Signal Processing Letters,2006,13(10):644-647.
    [Kersta62]L. G. Kersta, "Voiceprint identification",Nature,1962,196(4861):1253-1257.
    [Kinnunen04] T. Kinnunen, V. Hautamaki, and P.Franti,"Fusion of spectral feature sets for accurate speaker identification", in Proc.of the 9th Internationl Conference on Speech and Computer",2004:361-365.
    [Kinnunen06a] T. Kinnunen,"Joint acoustic-modulation frequency for speaker recognition", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006:665-668.
    [Kinnunen06b] T. Kinnunen, C. Koh, L.Wang, H. Li, and E. Chng, "Temporal discrete cosine transform:Towards longer term temporal features for speaker verification," in Proc.of the International Symposium on Chinese Spoken Language Processing,2006:547-558.
    [Kinnunen06c] T. Kinnunen, E. Karpov, and P. Fuanti, "Real-time speaker identification and verification", IEEE Transaction on Audio, Speech and Language Processing, 2006,14(1):277-288.
    [Kinnunen09] T. Kinnunen, J. Saastamoinen, V. Hautamaki, et al.,"Comparative evaluation of maximum a Posteriori vector quantization and Gaussian mixture models in speaker verification", Pattern Recognition Letters 2009,30(4),341-347.
    [Kinnunen 10] T. Kinnunen, and H. Li, "An overview of text-independent speaker recognition:from features to supervectors", Speech Communication,2010,52(1):12-40.
    [Kittler97] J. Kittler, G. Matas, K. Jonsson, and M.Sanchez, "Combing evidence in personal identity verification systems", Pattern Recognition,1997,18(9):845-852.
    [Kittler98] J. Kittler, M. Hatef, R. P. Duin, and J. G. Matas, "On combining classifiers", IEEE Transaction on Pattern Analysis and Machine Intelligence,1998,20(3):226-239.
    [Kolano99]G. Kolano, and P. Regel-Brietzmann, "Combination of vector quantization and Gaussian mixture models for speaker verification", in Proc.of the 6th European Conference on Speech Communication and Technology,1999:1203-1206.
    [KongA09]A. Kong, D. Zhang and M. Kamel, "A survey of palmprint recognition", Pattern Recognition,2009,42(7):1408-1418.
    [Kryszczuk07] K. Kryszczuk, J. Richiardi, P. Prodanov, and A. Drygajlo, "Reliability-based decision fusion in multimodal biometric verification systems", EURASIP Journal of Advances in Signal Processing,2007(1):74-83.
    [Lam97]L. Lam, and C. Y. Suen,"Application of majority voting to pattern recognition:an analysis of its behavior and performance", IEEE Transaction on System Man Cybernet. Part A:System Humans,1997,27(5):553-568.
    [Lanckriet04] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, M. I. Jordan, "Learning the kernel matrix with semidefinite programming", The Journal of Machine Learning Research, 2004,5(1):27-72.
    [LangF07]郎方年,周激流,钟钒,宋恩彬,闫斌,基于四元数的图像信息并行融合,自动化学报,2007,33(11)：1136-1143.
    [Lawrence89] R. Lawrence,"A tutorial on hidden markov models and selected applications in speech recognition", Proceedings of the IEEE,1989,77(2):257-286.
    [Laws97]E. Laws, "Mathematical methods for oceanographers:an introduction", John and Wiley and Sons, Inc, ISBN 0471162213,1997:72-95.
    [LeeY90]Y.H. Lee and S.Y. Park, "A study of convex/concave edges and edge-enhancing operators based on the Laplacian", IEEE Transactions on Circuits and Systems,1990,37(7):940-946.
    [Lewis06]D. P. Lewis, T. Jebara, and W. S. Noble, "Nonstationary kernel combination", in Proc.of the 23rd International Conference on Machine Learning,2006:553-560.
    [LinJ03]A Practical Guide to Support Vector Classification, http://www.csie.ntu.edu.tw/-cjlin/papers/guide/guide.pdf
    [LiQ06a]Q.Li, Z.Qiu, and D.Sun, "Feature-level fusion of hand biometrics for personal verification based on kernel PCA", Lecture Notes in Computer Science,2005(3832):744-750.
    [LiQ06b]李强,裘正定,孙冬梅,张延强,“指横纹：一种新的生物身份特征”,自动化学报,2007,33(6)：596-560.
    [LiQ06c]Q. Li, Z. Qiu, D. Sun, Y. Zhang, "Subspace Framework for Feature-Level Fusion with Its Application to Handmetric Verification", in Proc.of the 8th International Conference on Signal Processing,2006:16-20.
    [LiQ06d]李强,手部特征识别及特征级融合算法研究[博士学位论文],北京交通大学,2006.
    [LiuC91]C. S. Liu, C. S. Huang, M. T. Lin,and H. C. Wang, "Automatic speaker recognition based upon various distances of LSP frequencies", in Proc.of the 25th Annual IEEE International Carnahan Conference on Security Technology,1991:104-109.
    [LongY08]Y. Long, W. Guo, and L. Dai, "Interfusing the confused region score of speaker verification systems", in Proc.of the International symposium on Chinese spoken language processing,2008:314-317.
    [Lowe04]D. G Lowe, "Distinctive Image Features from Scale-Invariant", International Journal of Computer Vision,2004,60(2):91-110.
    [LiH10]李宏言,黄申,王士进,梁家恩,徐波,基于GMM-UBM和GLDSSVM的英文发音错误检测方法,自动化学报,2010,36(2)：332-336.
    [Longworth08] C. Longworth, M. J. F. Gales,"Multiple kernel learning for speaker recognition", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2008:1581-1584.
    [Matlab]Matlab Online Help Documentation, http://www.mathworks.com/access/helpd esk/help /helpdesk.html
    [Matsui94]T. Matsui and S. Furui. "Comparison of text-independent speaker recognition methods using vq-distortion and discrete/continuous hmm's", IEEE Transactions on Speech and Audio Processing,1994,2(3):456-459.
    [MaZ03]Z. Ma, Y. Yang, and Z. Wu,"Further feature extraction for speaker recognition", in Proc.of the IEEE International Conference on Systems, Man and Cybernetics,2003:4135-4138.
    [Memon09]S. Memon, M. Lech, and L. He, "Using information theoretic vector quantization for inverted MFCC based speaker verification", in Proc.of the 2nd International Conference on Computer, Control and Communication,2009:1-5.
    [Mikolajczyk02] K. Mikolajczyk, "Detection of local features invariant to affine transformations"[Ph.D. thesis], Institute National Polytechnique de Grenoble, France,2002.
    [Mitra91]S. K. Mitra, H. Li, I. Lin,and T. Yu, "A new class of nonlinear filters for image enhancement", in Proc.of the International conference on Acoustics, Speech, and Signal Processing, (ICASSP),1991:2525-2528.
    [MoonasarOl] V. Moonasar and G. Venayagamoorthy, "A committee of neural networks for automatic speaker recognition (ASR) systems", in Proc.of the Joint Conference on Neural Networks,2001:2936-2940.
    [Murthi97]M.N. Murthi and B.D. Rao, "All-pole model parameter estimation for voiced speech", in Proc.of the IEEE Workshop on Speech Coding for Telecommunicat ions,1997:17-18.
    [Murty06]K. Murty and B. Yegnanarayana,"Combining evidence from residual phase and MFCC features for speaker recognition", IEEE Signal Processing Letters,2006,13(1):52-55.
    [Myers03]J. L. Myers and A. D. Well, "Research design and statistical analysis", Second edition, Lawrence Erlbaum Associates Inc. publisher, ISBN 0805840370,2003:563-564.
    [NIST01]The NIST 2001 Speaker ID Evaluation Protocol, [Online], Available: http://www.nist.gov/speech/tests/spk/2001/index.htm.
    [NIST09]Speaker Recognition Evaluation, [Online], Available: http://www.itl.nist.gov/iad/mig//tests/sre/
    [Nosratighods09] M. Nosratighods, T. Thiruvaran, J. Epps, E. Ambikairajah, B. Ma, and H. Li, "Evaluation of a fused FM and cepstral-based speaker recognition system on the NIST 2008 SRE", in Proc.of the International conference on Acoustics, Speech, and Signal Processing, (ICASSP),2009:4233-4236.
    [Ong05]C. S. Ong, A. J. Smola, and R. C. Williamson, "Learning the kernel with hyperkernels", The Journal of Machine Learning Research,2005,6(7):1043-1071.
    [Pan02]J. Pan, K. Fang and K. Fang, "Growth curve models and statistical diagnostics", Springer-Verlag New York Inc. ISBN 0387950532,2002:267-269.
    [Pavlidis01] P. Palidis, J. Weston, J. Cai, W. N. Grundy, "Gene functional classification from heterogeneous data", in Proc.of the 5th Annual International Conference on Computational Biology,2001:242-248.
    [Polese100] A. Polesel, G Ramponi, and V.J. Mathews,"Image enhancement via adaptive unsharp masking", IEEE Transactions on Image Processing,2000,9(3):505-510.
    [Prasanna06] S. Prasanna, C. Gupta, and B. Yegnanarayana, "Extraction of speaker-specific excitation information from linear prediction residual of speech", Speech Communication, 2006,48(10):1243-1261.
    [QinJ03]Q. Jin, J. Navratil, D. A. Reynolds, J. P. Campbell, W. D. Andrews, and J. S. Abramson. "Combining cross-stream and time dimensions in phonetic speaker recognition", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),2003: 800-803.
    [Rajasekaran82] P. Rajasekaran, J. Hansen, "Finite word length effects of the Leroux-Gueguen algorithm in computing reflection coefficients",in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP),1982:1286-1290.
    [Rakotomamonjy07] A. Rakotomamonjy, F. Bach, S. Canu,and Y. Grandvalet,"More efficiency in multiple kernel learning", in Proc. of the 24th International Conference on Machine learning, 2007:775-782.
    [Rakotomamonjy08] A. Rakotomamonjy, F.R.Bach, S. Canu, and Y. Grandvalet, "Simple MKL", The Journal of Machine Learning Research,2008,9(11):2491-2521.
    [Reynolds03] D. A. Reynolds, W. J. Campbell, J. Navratil, B.Peskin,, A. Adami, et al., "The supersSID project:exploiting high-level information for high-accuracy speaker recognition", in Proc.of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1982:784-787.
    [Ribeiro99] C. Ribeiro, and I. Trancoso, "Speaker adaptation in a phonetic vocoding environment", in Proc.of the IEEE Workshop on Speech Coding,1999:64-66.
    [Robinson80] E. Robinson, and S. Treitel, "Maximum entropy and the relationship of the partial autocorrelation to the reflection coefficients of a layered system", IEEE Transaction on Acoustics, Speech and Signal Processing,1980,28(2):224-235.
    [Ross01]A. Ross, A. K. Jain, and J. Qian, "Information fusion in biometrics", in Proc.of the 3rd International Conference on AVBPA,2001:354-359.
    [Ross03]A. Ross, and A. Jain, "Information fusion in biometrics", Pattern Recognition Letters, 2003,24(13),2115-2125.
    [Sant'Ana06] R. Sant'Ana, R. Coelho, and A. Alcaim, "Text-independent speaker recognition based on the hurst parameter and the multidimensional fractional brownian motion model", IEEE Transaction on Audio, Speech and Language Processing,2006,14(3),931-940.
    [Seyedin08] S.Seyedin and M.Ahadi, "Feature extraction based on DCT and MVDR spectral estimation for robust speech recognition", in Proc.of the 9th International Conference on Signal Processing,2008:605-608.
    [SIFT]SIFT demo program, http://people.cs.ubc.ca/～lowe/keypoints/
    [Shahin98]I. Shahin, and N. Botros, "Speaker identification using dynamic time warping with stress compensation technique", in Proc.of the IEEE Southeastcon'98,1998:65-68.
    [Solewicz07] Y. Solewicz, and M. Koppel,"Using post-classifiers to enhance fusion of low-and high-level speaker recognition", IEEE Transaction on Audio, Speech and Language Processing,2007,15(7):2063-2071.
    [Sonnenburg05] S. Sonnenburg, G. Ratsch, C.Schafer, "A general and efficient multiple kernel learning algorithm", in Proc.of the Advances in Neural Information Processing Systems, The MIT Press,2005:1273-1280.
    [Sonnenburg06] S.Sonnenburg, G. Ratsch, C. Schafer, and B.Scholkopf, "Large scale multiple kernel learning", Journal of Machine Learning Research,2006(7):1531-1565.
    [Thiruvaran08a] T. Thiruvaran, E. Ambikairajah, and J.Epps, "Extraction of FM components from speech signals using all-pole model", Electronics Letters,2008,44(6):449-450.
    [Thiruvaran08b] T. Thiruvaran, E. Ambikairajah, and J. Epps, "FM features for automatic forensic speaker recognition", in Proc.of the Interspeech 2008:1497-1500.
    [Tommi98]T.Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, MIT Press,1998:487—493.
    [TongR06]R. Tong, B. Ma, K. Lee, C. You, D. Zhu, et.al,"Fusion of acoustic and tokenization features for speaker recognition", in Proc.of the 5th International Sym. on Chinese Spoken Language Processing,2006:494-505.
    [Vale08a]E. E. Vale, and A. Alcaim, "Adaptive weighting of subband-classifier responses for robust text-independent speaker recognition", Electronics letters,2008,44(21):1280-1282.
    [Vale08b]E. E. Vale, A. Cunha, and A. Alcaim,"Robust text-independent identification using multiple subband-classifiers in colored noise environment", in Proc.of the 15th International Conference on Systems, Signals and Image Processing,2008,275-278.
    [Varcho108] P. Varchol, D. Levicky and J. Juhar, "Multimodal biometric authentication using speech and hand geometry fusion", in Proc.of the 15th International Conference on Publication on Systems, Signals and Image Processing,2008:57-60.
    [Veith98]D. Veith and P. Abry, "A wavelet-based joint estimator of the parameters of long-range dependence", IEEE Trans. Inform. Theory,1998,45 (3):878-897.
    [Wan02]V. Wan and S. Renals,'"Evaluation of kernel methods for speaker verification and identification", in Proc.of the International conference on Acoustics, Speech, and Signal Processing, (ICASSP),2002:669-672.
    [WangH10]汪洪桥,孙富春,蔡艳宁,陈宁,丁林阁,多核学习方法,自动化学报,2010,36(8)：1037-1050.
    [WangJ05]J. Wang, and J. Wang, "Speaker recognition using features derived from fractional fourier transform",in Proc.of the IEEE International Conference on Automatic Identification Advanced Technologies,2005:95-100.
    [WangL09]L. Wang, S. Ohtsuka, and S. Nakagawa,"High improvement of speaker identification and verification by combing MFCC and phase information", in Proc.of the International conference on Acoustics, Speech, and Signal Processing, (ICASSP),2009:4529-4532.
    [WangY03]Y. Wang, T. Tan, A. K. Jain, "Combining face and iris biometric for identify verification", in Proc.of the Fourth International Conference on AVBPA,2003,805-813.
    [Welch03]L. R. Welch, "Hidden markov models and the baum-welch algorithm", IEEE Information Theory Society Newsletter,53(4).
    [WenC05]温昌兵,杨扬,颉斌,“并行特征融合在金融手写汉字识别中的应用”,计算机工程,2005,31(19)：178-179.
    [Whitehouse08] Office of the Press Secretary, The White House (5 June 2008)."National Security Presidential Directive and Homeland Security Presidential Directive". Press release.
    [WuX06]X. Wu, K. Wang, and D.Zhang, "Fusion of Multiple Features for Palmprint Authentication", in Proc.of the 5th International Conference on Machine Learning and Cybernetics,2006:3260-3265.
    [XuX07a]X.Xu and Z.Mu, "Feature Fusion Method Based on KCCA for Ear and Profile Face Based Multimodal Recognition", in Proc.of the International Conference on Automation and Logistics,2007:620-623.
    [XuX07b]X.Xu, Z.Mu and L.Yuan, "Feature-level fusion method based on KFDA for multimodal recognition fusing ear and profile face", in Proc.of the International Conference on Wavelet Analysis and Pattern Recognition,2007,3(2-4):1306-1310.
    [Yang03]J.Yang, J.Yang, D.Zhang and J.Lu, "Feature fusion:Parallel strategy vs. serial strategy", Pattern Recognition,2003,36 (6):1369-1381.
    [YangJ03]杨健,杨静宇,高建贞,基于并行特征组合与广义K-L变换的字符识别软件学报2003,14(3)：490-495.
    [Yapane103] U. Yapanel and J. Hansen, "A new perspective on feature extraction for robust in-vehicle speech recognition", in Proc.of the Eurospeech,2003:1281-1284.
    [Yegnanarayana05] B. Yegnanarayana, S.R.M. Prasanna, J. M. Zachariah, and C. S. Gupta, "Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system", IEEE Transaction on speech, and audio processing, 2005,13(4):575-582.
    [ZhangD00]D. Zhang, "Palmprint authentication", Kluwer Academic Publish.2004.
    [Zeng07]Z.Zeng, A.Watters, "A Novel Face Hashing Method with Feature Fusion for Biometric Cryptosystems", In:Proc. of 4th European Conf. on Universal Multiservice Networks, 2007,439-444
    [ZhengN07]Zheng, N., Lee, T., and Ching, P. "Integration of complementary acoustic features for speaker recognition", in Proc.of the IEEE Signal Processing Letters 2007,14(3):181-184.
    [ZhongH08]H. Zhong, "Speaker segmentation and verification"[Master Thesis], School of Computer Engineer, Nanyang Technological University,2008.
    [ZhouX06]X. Zhou and B.Bhanu, "Feature Fusion of Face and Gait for Human Recognition at a Distance in Video", in Proc.of the 18th International Conference on Pattern Recognition,2006, 529-532.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700