智能环境下基于视听信息多层级融合的身份识别

英文题名：Person Recognition Based on Audio-Visual Information with Multi-Level Fusion under Smart Room
作者：吴迪
论文级别：博士
学科专业名称：控制理论与控制工程
中文关键词：智能环境 ; 身份识别 ; 融合规则 ; 人脸识别 ; 说话人识别
英文关键词：Smart Environment ; Person Recognition ; Fusion Criteria ; Face
英文关键词：Recognition ; Speaker Recognition
学位年度：2014
导师：曹洁
学科代码：081101
学位授予单位：兰州理工大学
论文提交日期：2014-06-03
答辩委员会主席：党建武

摘要

近年来,随着人们对安全要求的逐渐提高以及远程视频会议系统的快速发展,智能环境下基于生物特征的身份识别技术成为模式识别领域的研究热点,其在智能视觉物联网、公共安全、金融服务和视频会议系统等众多领域有着广泛的应用。受数据噪音和识别系统本身的限制,基于单一生物特征的身份识别系统所能达到的准确率是有限的,为此,研究人员提出利用视听信息融合身份识别来提高识别的准确率,受到了广泛的关注。但是目前基于视听信息融合的身份识别主要局限于理想环境下的单模态识别以及在现有融合方法上音视频特征的简单融合,对于复杂环境下单模态生物特征的有效提取、高精度高普适性识别算法的构造与音视频特征在不同融合层级最优融合算法的确定少有考虑。从人的视听觉认知机理出发,本文从特征提取、识别算法和融合规则三个方面对视听信息融合身份识别进行了研究,以便为智能环境下的视听信息识别提供可行的解决方案,本文的主要工作和创新点如下：
     1.实现了复杂背景下人脸特征和语音特征的精确高表征提取
     针对人脸图像DCT特征系数的最优提取问题,本文提出了一种基于鉴别能力分析的DCT系数提取方法,在分析DCT系数鉴别能力值的基础上提取那些鉴别能力值较大的DCT系数作为特征。在分析头发几何特性和颜色特性的基础上,本文将人体的Hair特征应用于人脸识别,扩展了人脸特征的多样性。针对传统语音参数MFCC受噪声影响较大而且只能反映语音静态特性的缺点,本文基于能有效反映人耳听觉特性的Gammatone滤波器,提取了Gammatone滤波倒谱系数,并基于滑动差分倒谱,提取了能反映语音动态特性的Gammatone滑动差分倒谱系数。
     2.提出了可有效解决“高维小样本问题”的人脸识别算法
     目前,基于子空间分析的方法由于描述能力强、可分性好、计算简单等优点,成为人脸识别的主流算法,但常常面临“高维小样本问题”,导致人脸识别系统泛化能力较差。本文结合子空间分析方法和核思想,先后提出核相关权重鉴别分析算法和核鉴别局部保持投影算法,一方面解决了“高维小样本问题”,另一方面解决了传统子空间分析方法由于其线性本质所导致的在处理高度线性不可分对象时能力差的缺点。
     3.解决了说话人识别GMM模型的建模问题
     GMM模型是目前说话人识别的主流算法,并且在此基础上衍生了一系列说话人识别算法。针对由于训练语料较短而导致GMM模型参数训练不充分、识别性能下降的问题,本文通过引入因子分析技术,实现了一种自适应均值的GMM模型。i-vector说话人识别系统是在GMM模型和因子分析技术基础上产生的目前国内外说话人识别研究前沿的主流系统,本文通过改进局部保持投影算法,实现了i-vector说话人识别系统中i-vector矢量的有效降维。
     4.建立了不同层次音视频特征的最优融合规则
     本文以信息熵理论、概率密度方法和决策科学为指导,建立最优的匹配层融合规则和解决D-S证据理论的证据冲突问题。首先在分析现有证据冲突问题解决方法的基础上,提出基于群体决策和多准则选择融合的证据组合方法,有效解决证据冲突问题；其次为避免对匹配分数密度进行估计,本文将总错误概率TER引入到匹配层融合,通过TER来刻画匹配分数的分布,并将不确定度量融合方法引入到多特征融合识别；然后采用高斯密度求解加性融合中的最优权值,并将其用于逻辑回归排序层融合；最后针对匹配分数密度融合密度函数的求解,引入FAR和FRR以求解信任度函数,并基于三角模算子融合信任度函数,有效规避加性融合中权值的求解。
     综上所述,本文的研究内容有效提高了计算机对复杂感知信息的理解能力和对异构信息的处理能力,进一步拓展了多生物特征融合身份识别的适用条件和应用范围,有效提高了智能环境下基于音视频多特征融合身份识别的鲁棒性和识别率,对推动我国人机交互技术的发展具有重要的意义。
Rencent years, with the gradually improvement of the safety requirement and the fastly development of the remote video conference system. The person recognition technology based on biometrics is become the research focus in pattern recognition areas, it is used in smart video Internet of thing, public security, financial services and video conference system and many other fields widely. The accuracy of the single biometric person recognition is limited affected by the data noise and the limitation of the recognition system itself. In order to solve this problem, researchers try to fuse the visual information and audio information using information fusion technique that is visual-audio multi-biomrtric person recognition to improve the recognition accuracy has been received intensively attention. But now the visual-audio multi-biomrtric person recognition research is mainly confined to single biometric recognition in ideal condition and fusing based on existing fusion methods simply, they are few consideration for the effective extraction of the single biometric feature, the structure of the high precise and universal recognition algorithm and optimal fusion methods. From the apparent auditory cognitive mechanism of the people, the paper studied the visual-audio multi-biometric person recognition problem from three aspects:feature extraction, recognition algorithm and fusion method. In order to provide a workable solution scheme for the visual-audio multi-biometric recognition under smart room, the main work and the innovation points of the paper are standing as follows:
     1.Achieved the effective extraction of the face feature and voice feature under complex environments.
     First, extract the most effective DCT coefficients as recognition features is the key step to face feature extraction problem, from the angle of selecting the most effective features, this paper presents the DCT coefficient selection method according to Discriminant Power Analysis, and to extract the DCT coefficient which have the larger discriminant power values. At second, we put the Hair feature to be used in face recognition based on its geometrical features and color features in order to extending the diversity of the face features. At last, by means of emulating human auditory, Gammatone Filter Cepstral Coefficients is given out based on Gammatone Filter banks models, in view of the Gammatone Filter Cepstral Coefficients only reflect the static properties, the Gammatone Filter Shifted Delta Cepstral Coefficients is extracted based on Shifted Delta Cepstral.
     2.Two face recognition algorithms which can solve the small sample problem are proposed.
     In order to solve the problem of the lower recognition accuracy and worse robustness of face recognition under smart environment. Two new recognition algorithms called Kernel Relevance Weighted Discriminant Analysis (KRWDA) based on relevance weighted discriminant analysis and kernel discriminate local preserve projection(KDLPP) based on discriminate local preserve projection algorithmis is proposed which using kernel trick.
     3. The Gaussian Mixture Model modeling problem of speaker recognition are proposed.
     The performance of Gaussian Mixture Model(GMM) declines rapidly when the length of the training data is reduced under different unexpected noise environment, a adaptive Gaussian Mixture Model is proposed in this paper.The adaptive process for each GMM model with sufficient training data is transformed to the shift factor based on Factor Analysis, when the training data is insufficient, the coordinate of the shift factor is learned from the GMM mixtures of insensitive to the training data and then it is adapted to compensate other GMM mixtures. At the second,in order to enhance the recognition performance of the i-vector speaker recognition system under unpredicted noise environment, a improved local preserve projection algorithm which used for reduce dimension to i-vector is proposed on this paper.
     4.Optimal fusion rule is established at different levels of audio and visual features
     Established optimal fusion rule is the difficulty of fusion recognition, from now on, there is no omnipotent fusion strategy which can be used all of the actual situation. This thesis sets optimal matching layer fusion rules and solves the conflict between evidences bodies based on information entropy, probability density method and decision sciences. Based on the analysis of existing methods to solve the problem of conflict evidence, this thesis proposes evidences combination rule which based on group decision and multi-criteria choice fusion, which can effectively solve the conflict problem of evidence. Next. in order to avoid estimating the match fraction density, this thesis lends the total error probability into matching layer fusion to estimate match fraction density, at the same time, the uncertainty measurement fusion is introduced into multi-feature fusion recognition, and then the optimal weights of weighted sum rule can obtained based on Gaussian density which is applied to logistic regression sort fusion layer. Last, in order to solve the match fraction density fusion probability, FAR and FRR were lead into to solve confidence function and fuse confidence function based on triangle mold operator which can avoid calculates the weights of sum rule.
     In summary, the research contents improve the computer's ability to understand the complex information and the processing capabilities of heterogeneous information, and further expand the applicable conditions and applications of the integration of multi-biological identification, effectively improve the robust identification and recognition rate with the multi-feature fusion based audio and video features under smart the environment, it is important significance to promote the development of human-computer interaction technology

引文

[1]Kar-Ann Toh. MULTIMODAL BIOMETRICS:AN OVERVIEW AND SOME RECENT DEVELOPMENTS.Biometrics Engineering Research Center Report, March,2012:381-400.
    [2]NIST-National Institute of Standard and Technology[EB/OL]. [2012]. http://www.nist.gov/index.html
    [3]E. Erzin et al., Multimodal Person Recognition for Human-Vehicle Interaction[J].IEEE Multimedia, 2006,13(2):18-31.
    [4]苑玮琦,柯丽等.生物特征识别技术[M].北京：科学出版社,2009.
    [5]Renals S, Bourlard H, Carletta J etc. Multimodal Signal Processing: Human Interactions n Meetings [M].Cambridge, UK:Cambridge University Press,January 2012.
    [6]T. Shivappa, M. Trivedi, D. Rao. Audio-visual Information Fusion In Human Computer Interfaces and Intelligent Environments:A survey[J]. IEEE Proceedings,2011,98(10):1680-1691.
    [7]Massimo Tistarelli, Stan Z. Li, Rama Chellappa (ed). Handbook of Remote Biometrics[M]. Springer,2010.
    [8]A. Blauth, M. Vicente, P. Jung Claudio et al. Voice activity detection and speaker localization using audio visual cues [J]. Pattern Recognition Letters, 2012,33(4):373-380.
    [9]Zhen Lei, Dong Yi, Stan Z. Li. Discriminant Image Filter Learning for Face Recognition with Local Binary Pattern Like Representation[C]. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, (CVPR2012).Providence, Rhode island,2012:413-416.
    [10]Ross A,Jain A,Qian J Z.Information Fusion in Biometrics[J]. Pattern recognition Letters,2003,24(13):2115-2125.
    [11]Jinfeng Yang, Xu Zhang. Feature-level fusion of fingerprint and finger-vein for personal identification[J]. Pattern recognition letters,2012,33(1):623-628.
    [12]K. Bernardin, H. Ekenel, R. Stiefelhagen. Multimodal identity recognition in a smart room[J]. Personal and Ubiquitous Computing,2009,134(11):124-128.
    [13]Rahib Hidayat Abiyev and Koray Altunkays. Neural Network Based Biometric Personal Identification With Iris Recognition [J]. Internation Journal of Control,Automation and Systems,2009,7(1):17-23.
    [14]U. Kirchmaier, S. Hawe, K. Diepold. Dynamical information fusion of heterogeneous sensors for 3D tracking using particle swarm optimization [J].Information Fusion,2012,12(4):275-283.
    [15]Giulia Garau and Herv'e Bourlard. USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION[R]. Switzerland:Idiap Research Institute,2010.2
    [16]Sree Hari Krishnan Parthasarathi, Mathew Magimai.-Doss. EVALUATING THE ROBUSTNESS OF PRIVACY-SENSITIVE AUDIO FEATURES FOR SPEECH DETECTION IN PERSONAL AUDIO LOG SCENARIOS[R]. Switzerland:Idiap Research Institute,2010.2
    [17]国家中长期科学和技术发展规划纲要(2006-2020),http://www.gov.cn.
    [18]Anindya Roy, Sebastien Marcel. Visual processing-inspired Fern-Audio features for Noise-Robust Speaker Verification[R]. Switzerland: IdiapResearchInstitute,2010.2
    [19]Anindya Roy and Sebastien Marcel. Visual Processing Inspired Fern-Audio for Noise-Robust Speaker Verification[R]. Switzerland:Idiap Research Institute,2010.1
    [20]Shikun Feng, Zhen Lei, Dong Yi, Stan Z. Li. Online Content-aware Video Condensation[C], In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, (CVPR2012).Providence, Rhode island,2012:34-40.
    [21]R. Snelick et al., "Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems"[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence,27(3):450-455,2005.
    [22]CHIL-Computer In the Human Interaction Loop[EB/OL]. [2012]. http://chil. server,c
    [23]AMI Augmented Multimodal Interaction[EB/OL].2012.http://www.amiproject.org
    [24]CLEAR-Classification of Event, Activities and Relationships[EB/OL]. [2012]. http://www.clear-evaluation.org/
    [25]孙冬梅,裘正定.生物特征识别技术综述[J].电子学报,2001,29(12A)：1744-1748.
    [26]David Dean, Sridha Sridharan. Dynamic visual features for audio-visual speaker verification[J]. COMPUTER SPEECH AND LANGUAGE, 2011,24(2):136-149.
    [27]Anil K. Jain, Arun Ross, Salil Prabhakar. An introduction to biometric recognition[J]. IEEE Transactions Circuits and Systems for Video Technology, 2004,14(1):4-20.
    [28]Ogorman L. Comparing Passwords, Tokens, and Biometrics for User Authentication [J]. Proceedings of the IEEE,2003,91(11):2019-2040.
    [29]张祥德,张大为等.仿生算法与主成分析相融合的人脸识别算法[J].东北大学学报(自然科学版),2009,30(7)：972-975.
    [30]Guillaume Heusch, S'ebastien Marcel. A Novel Statistical Generative Model Dedicated To Face Recognition. IDIAP Research Report. May 2009:1-25.
    [31]肖小玲,李腊元.基于概率支持向量机方法的人脸识别[J].武汉理工大学学报(交通科学与工程版),2009,33(2):345-348.
    [32]P. Aishwarya, Karnan Marcus. Face recognition using multiple eigenface subspaces[J]. Journal of Engineering and Technology Research, 2010, 8(8):139-143.
    [33]李子荣,杜明辉.基于局部边界鉴别分析的人脸识别[J].电子与信息学报,2009,31(3)：527-531.
    [34]Mohamed Lamine Toure, Zou Beiji. Intelligent Sensor for Image Control Point of Eigenfaces for Face Recognition[J]. Journal of Computer Science, 2010,6(5):484-491.
    [35]王晓哲,李晨阳等.基于快速小波变换和FLD的人脸识别算法[J].东北大学学报(自然科学版),2009,30(2):166-169.
    [36]Anil Kumar Sao, B. Yegnanarayana. On the use of phase of the Fourier transform for face recognition under variations in illumination[J]. Signal, Image and Video Processing,2010,4(3):353-358.
    [37]Nazmeen Bibi Boodoo, R K Subramanian. Robust Multi-biometric Recognition Using Face and Ear Images[J]. International Journal of Computer Science and Information Security,2009,6(2):164-169.
    [38]Juan A. Morales-Cordovilla, Antonio M. Peinado. Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition[J].IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2011,19(3):640-651.
    [39]Qiang Wu, Liqing Zhang, and Guangchuan Shi.Robust Multifactor Speech Feature Extraction Based on Gabor Analysis[J].IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING,2011,19(4):927-936.
    [40]Kong Aik Lee, Chang Huai You.Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification[J]. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2011,19(4):861-870.
    [41]Jwu-Sheng Hu, Ming-Tang Lee, Chia-Hsing Yang.An embedded audio-visual tracking and speech purification system on a dual-core processor platform[J]. Microprocessors and Microsystems,2010,34(3):274-284.
    [42]Eren Akdemir, Tolga Ciloglu. Bimodal automatic speech segmentation based on audio and visual information fusion[J]. Speech Communication, 2011, 53(10):889-902.
    [43]Junyong You, UlrichReiter a, MiskaM.Hannuksela. Perceptual-based quality assessment for audio-visual services:A survey[J]. Signal Processing:Image Communication,2010,25(6):482-501.
    [44]Vahid Asadpoura, Mohammad Mehdi Homayounpourb, Farzad Towhidkhahc. Audio-visual speaker identification using dynamic facial movements and utterance phonetic content[J].Applied Soft Computing, 2011,11(12): 2083-2093.
    [45]Kazuhiro Hotta. Local Normalized Linear Summation Kernel for Fast and Robust Recognition [J]. Pattern Recognition.2010,43 (3):906-913.
    [46]Chris McCool, Jordi Sanchez-Riera and S'ebastien Marcel. Feature Distribution Modelling Techniques for 3D Face Verification[R]. IDIAP Research Report. February 2010:30-45.
    [47]周佳立,张树有等.基于双目被动立体视觉的三维人脸重构与识别[J].自动化学报,2009,35(2)：123-131.
    [48]Satyanadh Gundimada. Face recognition in multi-sensor images based on a novel modular feature selection technique[J]. Information Fusion,2010,11(2): 124-132.
    [49]Khalid Chougdali, Mohamed Jedra, Nouredine Zahid. Kernel relevance weighted discriminant analysis for face recognition[J]. Pattern Analysis & Applications,2010, 13(2):213-232.
    [50]Antonio Rama, Francesc Tarres, Jurgen Rurainsky. Aligned texture map creation for pose invariant face recognition[J]. Multimedia Tools and Applications,2010,49(3):545-565.
    [51]Muharram Mansoorizadeh, Nasrollah Moghaddam Charkari. Multimodal information fusion application to human emotion recognition from face and speech[J]. Multimedia Tools and Applications,2010,49(2):277-297.
    [52]Jian-Gang Wang.Incremental two-dimensional linear discriminant analysis with applications to face recognition[J].Journal of Network and Computer Applications,2010,33(3):313-324.
    [53]严云洋,郭志波等.融合多尺度多特征的人脸识别方法[J].南京理工大学学报(自然科学版),2009,33(1):47-52.
    [54]Jianzhong Wang.An adaptively weighted sub-pattern locality preserving projection for face recognition[J]. Journal of Network and Computer Applications,2010, 33(3): 323-332.
    [55]李金秀,高新波等.一种基于E-HMM的选择性集成人脸识别算法[J].电子与信息学报, 2009,31(2):288-292.
    [56]Abhishek Sharma, Anamika Dubey, A. N. Jagannatha, R. S. Anand. Pose invariant face recognition based on hybrid-global linear regression[J]. Neural Computing & Applications,2010,19(8):1227-1235.
    [57]Sadaoki Furui. Recent advances in speaker recognition[J]. Pattern Recognition Letters,2008,18(9):859-872.
    [58]Li Liu, Jialong He, G.Palm. Signal modeling for speaker identification [C].Acoustics, Speech and Signal Processing International Conference 2008:73-76.
    [59]Padmanabhan Raj an,Sree Hari Krishnan Parthasarathi,Hema AMurthy. Robustness of phase based features for speaker recognition[R]. Switzerland: Idiap Research Institute, December 2009.
    [60]Guillermo Aradilla Zapata. Acoustic Model for Posterior Feature in Speech Recognition[D]. Switzerland:Lausance. The Swiss Federal Institute of Technology.2008
    [61]Dong Yuan,Lu Liang. Studies on Model Distance Normalization Approach in Text-independent Speaker Verfication[J]. ACTA AUTOMATICA SINICA, 2009,35(5): 556-560.
    [62]张俊.基于VQ和DTW相结合的语音识别算法研究[D].[硕士学位论文]武汉：武汉理工大学,2007.
    [63]Wark T,Sridharan S. Adaptive fusion of speech and lip information for robust speaker identification[J]. Digital Signal Processing, 2008,11(3):169-186.
    [64]Giulia Garau and Herv'e Bourlard. USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION[R]. Switzerland:Idiap Research Institute, February 2010.
    [65]Sree Hari Krishnan Parthasarathi, Mathew Magimai.-Doss. EVALUATING THE ROBUSTNESS OF PRIVACY-SENSITIVE AUDIO FEATURES FOR SPEECH DETECTION IN PERSONAL AUDIO LOG SCENARIOS[R]. Switzerland:Idiap Research Institute,February 2010.
    [66]Anindya Roy, Sebastien Marcel. Visual processing-inspired Fern-Audio features for Noise-Robust Speaker Verification[R]. Switzerland: IdiapResearchlnstitute,March 2010.
    [67]Jain A. K., Ross A., Pankanti S. Biometrics:A Tool for Information Security [J].IEEE Transactions on Information and Security, 2006,1(2):125-143.
    [68]Paul M. Evitts a, Lindsay Portugal, Ami Van Dine, Aline Holler d. Effects of audio-visual information on the intelligibility of alaryngeal speech[J].Journal of Communication Disorders,2010,43(1):92-104.
    [69]Christian Micheloni a, SergioCanazza, GianLuca Forest. Audio-video biometric recognition for non-collaborative access granting[J].Journal of Visual Languages and Computing,2009,20(4): 353-367.
    [70]Ross A A, Nandakumar D, Jain A K. Handbook of Multi-biometrics[M]. New York:Springer-Verlag,2006.
    [71]Duda R O, Hart P E, Stork D G. Pattern Classication (Second Edition)[M]. New York:Wiley-Interscience,2000.
    [72]Snelick R, Indovina M, Yen J, Mink A. Multimodal biometrics:issues in design and testing[C]. In:Proceedings of the 5th International Conference on Multimodal Interfaces. New York, USA:ACM,2003.68-72.
    [73]Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems[J]. Pattern Recognition,2005,38(12):2270-2285.
    [74]Lobrano C, Tronci R, Giacinto G, Roli F. A score decidability index for dynamic score combination[C]. In:Proceedings of the 20th International Conference on Pattern Recognition. Istanbul, Turkey:IEEE, 2010.69-72.
    [75]Kazuhiro Hotta. Local normalized linear summation kernel for fast and robust recognition[J].Pattern Recognition,2010,43(3): 906-913.
    [76]Duda R O, Hart P E, Stork D G. Pattern Classification (Second Edition)[M]. New York:Wiley-Interscience,2000.
    [77]Snelick R, Indovina M, Yen J, Mink A. Multimodal biometrics:issues in design and testing[C]. In:Proceedings of the 5th International Conference on Multimodal Interfaces. New York, USA:ACM,2003.68-72.
    [78]Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recognition, 2005,38(12):2270-2285.
    [79]Dass S C, Nandakumar K, Jain A K. A principled approach to score level fusion in multimodal biometric systems[C]. In: Proceedings of the 5th International Conference on Audio and Video-based Biometric Person Authentication. New York, USA:Springer,2005.1049-1058.
    [80]Nandakumar K, Chen Y, Dass S C, Jain A K. Likelihood ratio-based biometric score fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,30(2):342-347.
    [81]Roland Hu, R.I. Damper.Optimal weighting of bimodal biometric information with specific application to audio-visual person identification[J]. Information Fusion,2009,10 (2):172-182.
    [82]刘红毅,王蕴红等.基于改进ENN算法的多生物特征融合的身份验证[J].自动化学报,2004,30(1)：78-85.
    [83]Wang F, Han J. Multimodal biometric authentication based on score level fusion using support vector machine[J]. Opto Electronics Review,2009,17(1): 59-64.
    [84]Kumar A, Kanhangad V, Zhang D. A new framework for adaptive multimodal biometrics management[J]. IEEE Transactions on Information Forensics and Security, 2010,5(1):92-102.
    [85]Tronci R, Giacinto G, Roli F. Dynamic score combination of binary experts[C]. In:Proceedings of the 19th International Conference on Pattern Recognition. Florida, USA:IEEE,2008.1-4.
    [86]李文立,郭凯红.D-S证据理论合成规则及冲突问题[J].系统工程理论与实践,2010,30(8)：1422-1452.
    [87]SMARANDACHE F, DEZERT J. Applications and Advances of DSmT for Information Fusion[M]. Rehoboth:American Research Press,2009.
    [88]张捍东,王翠华等.基于焦元支持度的合成规则[J].控制理论与应用,2011,28(5)：741-744.
    [89]Yager R R. Comparing approximate reasoning and probabilistic reasoning using the Dempster-Shafer framework[J]. International Journal of Approximate Reasoning,2009,50(5):812-821.
    [90]孙全,叶秀清.一种新的基于证据理论的合成公式[J].电子学报,2000,28(8):117-119.
    [91]邓勇,施文康.一种改进的证据推理组合规则[J].上海交通大学学报,2003,37(8)：1275-1278.
    [92]张山鹰,潘泉.一种新的证据推理组合规则[J].控制与决策,2000,15(5):540-544.
    [93]Martin A, OsswaldC. Toward a combination rule to deal with partial conflict and specificity in belief functions theory[C]. International Conference on Information Fusion. Quebec:Canada,2007.
    [94]Lin T C. Partition belief median filter based on Dempster-Shafer theory for image processing[J].pattern Recognition,2008,41(1):13-151.
    [95]MurphyCK. Combining belief functions when evident conflicts[J]. Decision Support Systems,2000,29(1):1-9.
    [96]杨善林,李永森等.基于技术进步和信息不对称的证据合成研究[J].系统工程学报,2007,22(3):268-273.
    [97]Fan X F, Huang H Z, Miao Q. Evidence relationship matrix and its application to D-S evidence theory for information fusion[J]. LNCS IDEAL, 2006,4224: 1367-1373.
    [98]Xu Gp, Tian WF, Qian L et.al. A novel conflict reassignment method based on gray relation analysis[J].pattern Recognition Letters,2007,28(15):2080-2087.
    [99]S. Basu, T. Choudhury, B. Clarkson, and A. Pentland. Learning Human Interactions with the Influence Model[D]. In NIPS,2001.
    [100]H. Hung, D. Jayagopi, C. Yeo, G. Friedland, S. Ba, J.M. Odobez, K. Ramchandran, N. Mirghafori, and D. Gatica-Perez. Using Audio and Video Features to Classify the Most Dominant Person in a Group Meeting[C]. In Proc. ACM Int. Conf. On Multimedia(ACM MM), Augsburg, Sep.2007.
    [101]D.B. Jayagopi, H. Hung, C. Yeo, and D. Gatica-Perez. Modeling Dominance in Group Conversations Using Non-verbal Activity Cues[R]. IDIAP Research Report, December 2007.
    [102]H. Friedland, Y. Huang, C. Yeo, and D. Gatica-Perez. Associating Audio-Visual Activity Cues in a Dominance Estimation Framework[C]. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR)Workshop on Human Communicative Behavior, Ankorage, Alaska, 2008.
    [103]Hayley Hung, Yan Huang, Gerald Friedland, Daniel Gatica-Perez.Estimating Dominance in Multi-Party Meeting using Speaker Diarization[J].IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2012,20(4)：847-860.
    [104]吴志勇,蔡莲红.基于动态贝叶斯网络的音视频双模态说话人识别[J].计算机研究与发展,2006,43(3):470-475.
    [105]赵晖,顾亚强.基于乘积HMM的双模态语音识别方法[J].计算机工程,2010.36(8)：7-9.
    [106]陈雁翔,刘鸣.智能环境中音视频双模态的身份辨识[J].中国科学技术大学学报,2010,40(5)：486-490.
    [107]Z M Hafed,M DLevine.Face recognition using the discrete cosine transform[J].International Journal of Computer Vision,2001,43(3):167-188.
    [108]D Ramasubramanian,Y V Venkatesh.Encoding and recognition of faces based on the human visual model and DCT[J]. Pattern Recognition,2001,34(]2):2447-2458.
    [109]张燕昆,刘重庆.一种新颖的基于LDA的人脸识别方法[J].红外与毫米波学报,2003,22(5)：327-330.
    [110]李建科,赵保军等.DCT和LBP特征融合的人脸识别[J].北京理工大学学报,2010,30(11)：1355-1359.
    [111]Z M Hafed,M D Levine. Face recognition using the discrete cosine transform [J].International J ournal of Computer Vision,2001 ,43 (3):167-188.
    [112]W Chen,J E Meng ,S Wu. PCA and LDA in DCT domain [J].Pattern Recognition Letters,2005,26 (15):2474-2482.
    [113]Saralees Nadarajah. Gaussian DCT Coefficient Models [J]. Acta Appl Math,2009,106(3):455-472.
    [114]PARK H,PARK C H. A comparison of generalized linear discriminant analysis algorithm[J].Pattern Recognition,2008,41 (3):1083-1097.
    [115]Yong Xu, Anni Zhong, David Zhang.LPP solutionschemes for use with face recognition [J]. Pattern Recognition,2010,43(12):4165-4176.
    [116]辜小花,龚卫国.核保局鉴别分析人脸识别算法[J].仪器仪表学报,2010,31(9)：2016-2021.
    [117]ORLfacedatabase[EB/OL].http://www.cam-orl.co.uk/facedatabase.html. AT&T Laboratories Cambridge.
    [118]Yalefacedatabase[EB/OL].http://www.es.columbia.edu/belhumeur/pub/images /yalefaces/, Colubmbia University.
    [119]Tyler K. Perrachione, Stephanie N. Del Tufo, John D. E. Gabrieli. Human Voice Recognition Depends on Language Ability[J]. Science,2011,333:595.
    [120]Parvin Zarei Eskikanda, Seyyed Ali Seyyedsalehia. Robust speech recognition by extracting invariant features[J].Procedia-Social and Behavioral Sciences, 2012,32(3):230-237.
    [121]Shao Yang, Jin Zhaozhuang, Wang Deliang. An auditory based feature for robust speech recognition[C]. ICASSP, Taibei, Tanwan,2009:4625-4628.
    [122]Jun Du,Qiang Huo.A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition[J].IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING,2011,19(8):2285-2293.
    [123]Md. Sahidullah, Goutam Saha. Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition[J]. Speech Communication,2012,54 (4):543-565.
    [124]PARK H,PARK C H. A comparison of generalized linear discriminant analysis algorithm[J].Pattern Recognition,2008,41(3):1083-1097.
    [125]Shuyan Wang. Face recognition using enhanced linear discriminant analysis[J]. Computer Vision, IET,2010,4(3):195-208.
    [126]邹建法,王国胤.基于增强Gabor特征和直接分步线性判别分析的人脸识别[J].模式识别与人工智能,2010,23(4):477-482.
    [127]PP. Aishwarya, Karnan Marcus. Plastic Surgery:A New Dimension to Face Recognition[J]. Information Forensics and Security, IEEE Transactions on, 2010,5(3):441-448.
    [128]Baochang Zhang, Yu Qiao. Face recognition based on gradient gabor feature and Efficient Kernel Fisher analysis[J]. Neural Computing & Applications, 2010,19(4):617-623.
    [129]Loog M, Duin RPW, Hacb-Umbach R Multiclass linear dimension reduction by weighted pairwise Fisher criteria[J]. IEEE Trans P AMI,2001, 23(7):762-766.
    [130]Lotlikar R, Kothari R. Fractional-step dimensionality reduction[J]. IEEE Trans Pattern Anal Mach Intell,2000,22(6):623-627.
    [131]Tang EK, Suganthan PN, Yao X, Qin AK. Linear dimensionality reduction using relevance weighted LDA[J]. Pattern recognition,2005,38(4):485-493.
    [132]马晓红,赵琳琳.基于QR分解和提升小波变换的鲁棒音频水印方法[J].大连理工大学学报,2010,50(2):278-282.
    [133]Zhiming Liu, Chengjun Liu. Fusion of color, local and global frequency information for face recognition [J]. Pattern Recognition,2010, 43(8):2882-2890.
    [134]X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang. Face recognition using Laplacian faces[J] IEEE Transactions on Pattern Analysis and Machine Intelligence,2005, 27(3):328-340.
    [135]H. Hu. Orthogonal neighborhood preserving discriminant analysis for face recognition[J]. Pattern Recognition,2008,41:2045-2054.
    [136]Zheng Ji, Xiao-Chen Lian and Bao-Liang Lu. Gender Classification by Information Fusion of Hair and Face[C].Published by In-The. November 2008.
    [137]LIU Shuang, X IE J in-rong, LU Bao-liang.Gender Classification UsingHair Features [J].Journal of Computer Simulation.2009:26(2),212-216.
    [138]Yacoob, Y. Detection and analysis of hair[J]. IEEE Trans. Pattern Anal. Mach. Intell.2006,28(7):1164-1169.
    [139]Member-Larry S. Davis.Lapedriza, A., Masip, D., Vitria, J. Are external face features useful for automatic face classification? In:CVPR'05:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
    [140]Yacoob, Y., Davis, L. Detection, analysis and matching of hair[J]. The tenth IEEE International Conference on Computer Vision:741-748.
    [141]Z M Hafed,M D Levine. Face recognition using the discrete cosine transform[J].International Journal of Computer Vision,2001,43 (3):167-188.
    [142]W. Yu, X. Teng, C. Liu. Face recognition using discriminant locality preserving projections [J] Image Vision Computing,2006,24:239-248.
    [143]Wen-Chung Kao, Ming-Chai Hsu, Yueh-Yiing Yang. Local contrast enhancement and adaptive feature extraction for illumination invariant face recognition [J].Pattern Recognition,2010,43(5):1736-1747.
    [144]Kazuhiro Hotta. Local normalized linear summation kernel for fast and robust recognition [J].Pattern Recognition,2010,43(3):906-913.
    [145]Quan Le, Samy Bengio. Client Dependent GMM-SVM Models for Speaker Verification[R]. Switzerland:Idiap Research Institute, Feb.2010.
    [146]X. Anguera, C. Wooters, B. Pesking, M. Aguilo. Robust Speaker Segmentation for Meetings:The ICSI-SRI Spring 2005 Diarization System[C]. Proc. NIST MLMI Meeting Recognition Work-shop,2005.
    [147]Qin Jin, Schultz T, Waibel A. Far-Field Speaker Recognition[J]. IEEE Trans. on Audio, Speech and Language processing,2007,15(7):2023-2032.
    [148]鲍焕军,郑方GMM-UBM和SVM说话人辨认系统及融合的分析[J].清华大学学报(自然科学版),2008,48(S1)：693-698.
    [149]杨海,张翔,梁春燕等.联合因子分析和稀疏表示在稳健性说话人确认中的应用[J].声学学报,2012,37(5):548-552.
    [150]Tomas Pfister, Peter Robinson, Real-Time Recognition of Affective States from Nonverbal Features of Speech and Its Application for Public Speaking Skill Analysis[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2011,2(2):66-78.
    [151]Yongjun He, Jiqing Han.Gaussian Specific Compensation for Channel Distortion in Speech Recognition[J]. IEEE SIGNAL PROCESSING LETTERS, 2011,18(10):599-602.
    [152]Omid Dehzangi, Bin Mab, Eng Siong Chng, Haizhou Li. Discriminative feature extraction for speech recognition using continuous output codes[J]. Pattern Recognition Letters,2012,33:1703-1709.
    [153]Deepu Vijayasenan, Fabio Valentel, Herv'e Bourlard, MULTISTREAM SPEAKER DIARIZATION BEYOND TWO ACOUSTIC FEATURE STREAMS[R].IAIAP Research Report,2012.
    [154]Liu Di, Sun DongMei, Qiu ZhengDing. Feature Level Fusion Based on Speaker Verification Via Relation Measurement Fusion Framework[J]. ACTA AUTOMATIC SINICA,2011,37(12):1503-1513.
    [155]Gui-Fu Lu, Zhong Lin, Zhong Jin. Face recognition using discriminant locality preserving projections based on maximum margin criterion[J]. Pattern Recognition,2010,43(10):3572-3579.
    [156]Yong Wang, Yi Wu. Face recognition using Intrinsicfaces[J]. Pattern Recognition,2010, 43(10):3580-3590.
    [157]Eric Sung, Wei-Yun Yau. Face Recognition in Global Harmonic Subspace[J].Information Forensics and Security, IEEE Transactions on,2010, 5(3):416-424.
    [158]eong Y. Speaker adaptation based on the multilinear decomposition of training speaker models[C]. In:Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA:IEEE,2010. 4870-4873.
    [159]Yongjun He, Jiqing Han.Gaussian Specific Compensation for Channel Distortion in Speech Recognition[J]. IEEE SIGNAL PROCESSING LETTERS, 2011,18(10):599-602.
    [160]辜小花,龚卫国,杨利平.有监督图优化保局投影[J].光学精密工程,201 1,19(3)：672-680.
    [161]Md. J. Alam, T. Kinnunen, P. Kenny, P. Ouellet, and D. O'Shaughnessy. Multitaper MFCC features for speaker verification using i-vectors[C]. In Proc. IEEE Automatic Speech Recognition and Understanding (ASRU2011), Hawaii, December 2011:547-552.
    [162]Jun Du,Qiang Huo.A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition[J].IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING,2011,19(8):2285-2293.
    [163]Hamid Reza Tohidypour, Seyyed Ali Seyyedsalehi, Hossein Behbood,Hossein Roshandel. A new representation for speech frame recognition based on redundant wavelet filter banks[J].Speech Communication, 2012,54:256-271.
    [164]P. Matejka et al., "Full-Covariance UBM and Heavy-Tailed PLDA in I-Vector Speaker Verification[C]. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Prague, Czech Republic,2011.
    [165]郭武,李轶杰,戴礼荣等.采用非监督得分规整和因子分析的说话人确认[J].电子学报,2009,37(4)：776-779.
    [166]郭武,李轶杰,戴礼荣等.说话人识别中的因子分析及空间拼接[J].自动化学报,2009,35(9)：1194-1199.
    [167]Shafer G.A mathematical theory of evidence[M]. Princeton:Princeton University Press,1976.
    [168]Zeng DH, xu JM, Xu G.Data fusion for traffic incident detection using D-S evident theory with probabilistic SVMs[J]. Journal of Computers,2008,3(10):36-43.
    [169]关欣,衣晓,孙晓明等.有效处理冲突证据的融合方法[J].清华大学学报：自然科学版,2009,49(1):138-141.
    [170]Klement, Erich Peter, Mesiar. Triangular norms, Position Paper I:Basic analytical and algebraic properties[J]. Fuzzy Sets Systems,2004,143:5-26.
    [171]胡丽芳,关欣,邓勇等.一种三角模糊数型多属性决策方法[J].控制与决策,2011,26(12)：1277-1281.
    [172]韩德强,韩崇昭,邓勇等.基于证据方差的加权证据融合[J].电子学报,2011,34(3)：153-157.
    [173]F. Cardinaux, C. Sanderson, and S. Bengio. User Authentication via Adapted Statistical Models of Face Images[J]. IEEE Trans. on Signal Processing,2006, 54(l):361-373.
    [174]Anne M.P. Canuto, Fernando Pintro, Joo C Xavier-Junior.Investigating fusion approaches in multi-biometric cancellable recognition[J].Expert Systems with Applications ,2013,40:1971-1980.
    [175]Roberto Tronci, Giorgio Giacinto, Fabio Roli.Designing multiple biometric systems:Measures of ensemble effectiveness[J].Engineering Applications of Artificial Intelligence,2009,22:66-78.
    [176]邓勇,蒋雯,韩德强.广义证据理论的基本框架子[J].西安交通大学学报,2010,44(12)：119-124.
    [177]权文,王晓丹,史朝辉等.多源不确定信息融合中的冲突证据快速合成方法[J].系统工程与电子技术,2012,34(2):333-336.
    [178]孔祥兵,舒宁,陶建斌等.一种基于多特征融合的新型光谱相似性测度[J].光谱学与光谱分析,2011,31(8):2166-2170.
    [179]胡丽芳,关欣,邓勇等.广义幂集空间中证据冲突的原因分析[J].控制理论与应用,2011,28(12)：1717-1722.
    [180]Shankar T. Shivappa, Mohan M. Trivedi, and Bhaskar D. Rao. Audio-visual Information Fusion In Human Computer Interfaces and Intelligent Environments:A survey[J].To Appear in Proceedings of the IEEE,2010,1-21.
    [181]Maria De Marsicob, Michele Nappia, Daniel Riccioa, Genny Tortora.A multiexpert collaborative biometric system for people identification[J].Journal of Visual Languages and Computing,2009,20:91-100.
    [182]Norman Pohan, Arun Rossb, Weifeng Leec, Josef Kittler.A user-specific and selective multimodal biometric fusion strategy by ranking subjects[J].Pattern Recognition,2013,46:3341-3357.
    [183]N. Poh, A. Merati, J. Kittler. Heterogeneous information fusion:a novel fusion paradigm for biometric systems[C]. in: International. Joint Conference on Biometrics (IJCB),2011:1-8.
    [184]N. Poh, J. Kittler. On using error bounds to optimize cost-sensitive multimodal biometric authentication [C] in:Proceedings of the 19th International Conference on Pattern Recognition (ICPR),2008:1-4.
    [185]顾鑫,王海涛,汪凌峰等.基于不确定性度量的多特征融合跟踪[J].自动化学报,2011,37(5)：550-559.
    [186]钟小品,薛建儒,郑南宁等.基于融合策略自适应的多线索跟踪方法[J].电子与信息学报,2007,29(5)：1017-1021.
    [187]B. Khaleghi, A. Khamis, F. Karray. Random finite set theoretic based soft/hard data fusion with application for target tracking[C]. in:Proc. of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems,2010:50-55.
    [188]Adil Omari and Anibal R. Figueiras.Feature Combiners With Gate Generated Weights for Classification [C].IEEE Transaction on Neural Networks and Learning Systems,2013,24(1):158-163.
    [189]Bahador Khaleghi, Alaa Khamis, Fakhreddine O, Saiedeh N. Razavi.Multisensor data fusion:A review of the state-of-the-art[J] Information Fusion,2013,14 (1):28-44.
    [190]Md. Maruf Monwar, Marina L. Gavrilova.Multimodal Biometric System Using Rank-Level Fusion Approach[C].IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART B:CYBERNETICS,2009,9(4):867-878.
    [191]J. Kittler, F.M. Alkoot. Sum versus vote fusion in multiple classifier systems [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25 (1):110-115.
    [192]Ajay Kumar, Sumit Shekhar.Personal Identification Using Multibiometrics Rank-Level Fusion[J].IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART C:APPLICATIONS AND REVIEWS,2011,42(5):743-752.
    [193]Md. Maruf Monwar-Marina Gavrilova. Markov chain model for multimodal biometric rank fusion[J].SIViP,2013,7:137-149.
    [194]Monwar, M.M., Gavrilova, M. A multimodal biometric system using rank level fusion approach[J]. IEEE Trans. SMC B:Cyber,2009,39(4),867-878.
    [195]Romain Giot,, Christophe Rosenberger.Genetic programming for multibiometrics [J]. Expert Systems with Applications,2012, 39,1837-1847.
    [196]Kumar, A., Kanhangad, V., Zhang, D. A new framework for adaptive multimodal biometrics management [J]. IEEE Trans. Inf. Forensics Secur. 2009.5 (1):92-102.
    [197]Ross A, Nandakumar K, Jain A K. Handbook of Multibiometrics [M].Springer, New York.
    [198]Adil Omari and Anibal R. Figueiras. Feature Combiners With Gate Generated Weights for Classification [J].IEEE Transaction on Neural Networks and Learning Systems,2013,24(1):158-163.
    [199]Hang, L. Zhang, D., Zhu, H.Online finger-knuckle-print verification for personal authentication[J]. Pattern Recognition,2010,43 (7):2560-2571.
    [200]Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recognition,2005,38(12):2270-2285.
    [201]Yogendra Narain Singha, Sanjay Kumar Singhb, Phalguni Gupta.Fusion of electrocardiogram with unobtrusive biometrics:An efficient individual authentication system[J].Pattern Recognition Letters,2012,33:1932-1941.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700