基于音频和视频特征融合的身份识别

英文题名：Personal Identification Based on Video and Audio Feature Fusion
作者：吴迪
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：信息融合 ; 人脸识别 ; 说话人识别 ; 多生物特征身份识别 ; 脉冲耦合神经网络
英文关键词：date fusion ; face recognition ; speech recognition ; multi-biometric recognition ; pulse coupled neural network
学位年度：2010
导师：曹洁
学科代码：081002
学位授予单位：兰州理工大学
论文提交日期：2010-06-04
答辩委员会主席：党建武

摘要

针对单模态的说话人识别和人脸识别在准确率,应用的限制性和局限性等方面的缺点,本文从信息融合的角度出发,在特征层将两种单模态信息进行融合,实现音频信息和视频信息双模态特征融合的身份识别。
     本文首先就单模态的说话人识别和人脸识别进行了分析。结合VQ和SVM识别模型各自的优点,实现了一种基于VQ和SVM混合说话人识别模型。对于特征脸人脸识别算法,本文用L1-范数,欧氏距离,MIN距离和混合马氏距离四种度量距离对算法进行了比较。然后将脉冲耦合神经网络应用到人脸识别中,并在此基础上建立了人脸识别系统。
     其次本文重点对双模态的音视频特征融合识别进行了研究,由于特征层融合可用的信息量大,可以用于实时处理,故本文实现了基于归一化和SVM,基于PCNN两种融合识别算法在特征层对音频和视频特征进行融合识别。前者本文是利用特征相连法将语音特征和人脸特征相连在一起,后者是将两种特征的熵序列融合在一起。实验表明,融合系统的识别率都要比单模态的识别率要高,特别是将噪音加入到语音信号后,单个说话人识别系统识别率下降很快,但是融合识别系统的识别率却能保持在一个良好的水平上。
Due to the application limitation and low accuracy of single mode speech recognition and face recognition, this thesis fuses two kind of information on feature level by information fusion theory, which realizes personal identification by audio and video two-mode features.
     First, we analyses single mode speaker recognition and face recognition. Combined the advantages of VQ and SVM, we provide a mixed speaker recognition model based on VQ and SVM. For the eigenface recognition algorithm, we use L1-norm, Euclidean distance, MIN distance and mahalanobis distance as distance measurement and compare the performance of the four distances. We then propose the face recognition algorithm based on Pulse Coupled Neural Network (PCNN) and build a face recognition system by this algorithm.
     Second, this thesis studies the recognition algorithm based on audio-video feature fusion. Information fusion on feature level has large amount of available information and can be used in real-time computing. So in this thesis we present two kind of recognition algorithm. One is based on normalization and SVM. The other is based on PCNN. We fuse the audio and video signals on feature level and recognize the speakers in the experiment. For the former algorithm, we use feature connection to combine audio feature and face feature. For the latter one, we fuse the entropy sequence of the two kind signals. Experiment results show that the recognition accuracy of the fusion system is higher than that of the single mode system. When noise is added to the speech signal, there is a dramatic decline on the recognition accuracy of single mode system. But for the fusion system, its recognition accuracy keeps on a satisfactory level.

引文

[1]包桂秋,林喜荣,苏晓生.基于人体特征的身份鉴别技术发展概况[J].清华大学学报：自然科学版.2001,41(4)：72-76.
    [2]周丽芳.基于人脸和语音信息融合的身份识别技术研究[D].[硕士学位论文]重庆：重庆大学,2007.
    [3]Ross A,Jain A,Qian J Z.Information Fusion in Biometrics[J].Pattern recognition Letters.2003,24(13):2115-2125.
    [4]Huiyuan Wang and David Zhang,A linear edge model and its application in losslessimagecoding[J].Signal Processing:Image Communication,2004,19(10)261-264
    [5]Chuanqing Zhao,Huiyuan Wang,Xiaojuan Wu,DCT Based Null Space Method forFace Recognition[C].The2007 International Conference on Image Processing,ComputerVision,andPatternRecognition(IPVC'07),June25-28,2007.
    [6]刘映杰,冯晓兰.多生物特征识别系统的关键技术[J].船舶电子工程2006：26(1)36-40
    [7]叶学义.基于虹膜和脸相的多生物特征身份识别及融合算法的研究[D].[博士学位论文]合肥：中国科学技术大学,2006.
    [8]李鸿志.基于特征融合和神经网络识别方法研究.[D][硕士学位论文]大连：东北师范大学,2008.
    [9]郭慧娟,潘世永.语音识别中的预处理[J].西华大学学报(自然科学版),2006,34(4),34-36.
    [10]李卓辉.基于声纹特征的身份认证技术研究和实现[D].[硕士学位论文]上海：华东师范大学,2007.
    [11]Li Liu, Jialong He, G. Palm. Signal modeling for speaker identification [J]. Acoustics, Speech and Signal Processing International Conference 2008:73-76.
    [12]张雄伟,陈亮,杨吉斌.现代语音处理技术及应用[M].北京：机械工业出版社,2003.
    [13]Sadaoki Furui.Recent advances in speaker recognition[J].Pattern Recognition Letters,2008(18),9,859-872.
    [14]Padmanabhan Rajan,Sree Hari Krishnan Parthasarathi,Hema AMurthy.Robustness of phase based features for speaker recognition[R]. Switzerland: Idiap Research Institute,2009.12
    [15]薛燕,基于视频内容的身份认证[D].[硕士学位论文]南京：南京理工大学,2006.
    [16]刘青山.人脸跟踪与识别的研究[J].中科院自动研究所,2003.
    [17]牛丽平,魏文利.人脸识别技术研究[J].计算机工程与设计,2006,39(11)：66-71.
    [18]Yang Fei, Su Jianbo, Dai Jingwen. Fast Quality Assessment of Face Image for Face Recognition[C].Proceeding of the 27th Chinese control conference. July16-18,2008.Kunming Yunnan,china._
    [19]程永光.基于肤色和面部几何特征的人脸检测算法的研究[D].[硕士学位论文]北京：北京交通大学,2008
    [20]徐燕,陈孝维,基于肤色和改进的贝叶斯分类器的人脸检测[J].计算机工程与设计,2008：27(09)：34-39.
    [21]王玉杰,师卫,人脸识别关键技术及面临的问题[J].科技情报开发与经济,2008：21(10)：48-52.
    [22]刘欢喜,刘允才,一种基于Adaboost算法的人脸检测[J].上海交通大学学报,2008：42(07)：57-62.
    [23]魏伟波,陈雅莎,基于支持向量机的Adaboost人脸检测算法[J].计算机仿真,2008：26(06)：67-71.
    [24]Guillermo Aradilla Zapata. Acoustic Model for Posterior Feature in Speech Recognition[D]. Switzerland:Lausance. The Swiss Federal Institute of Technology.2008
    [25]Li Liu,Jialong He,G.Palm.Signal modeling for speaker identification[J]. Speech and Signal Processing International Conference 2001:73-76.
    [26]张华.VQ声纹识别算法的研究[D].[硕士学位论文]西安：西安电子科技大学,2006.
    [27]王继成,张福炎.支持向量机理论综述[J].计算机科学.2000：27(3)：1-3.
    [28]雷震春.支持向量机在说话人识别中的应用[D].[博士学位论文]杭州：浙江大学,2006.
    [29]Quan Le, Samy Bengio. Client Dependent GMM-SVM Models for Speaker Verification[R]. Switzerland:Idiap Research Institute,2010.2
    [30]余建朝张瑞林.基于MFCC和LPCC的说话人识别.计算机仿真[J]2009,30(5)：1189-1191.
    [31]Dong Yuan,Lu Liang. Studies on Model Distance Normalization Approach in Text-independent Speaker Verfication.ACTA AUTOMATICA SINICA.2009,35(5).556-560.
    [32]张俊.基于VQ和DTW相结合的语音识别算法研究[D].[硕士学位论文]武汉：武汉理工大学,2007.
    [33]尹洪涛,付平,沙雪军.基于DCT和线性判别分析的人脸识别[J].电子学报2009,37(10)：2211-2214.
    [34]Xiaojuan Liu,Jianwu Dang,FanYang. Development of Face Recognition System Using Multi-views Datebase[C]. Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.2008.1013-1019.
    [35]李子荣,杜明辉.基于局部边界鉴别分析的人脸识别[J].电子与信息学报.2009,31(3)：527-531.
    [36]周佳立,张树有,杨国平.基于双目被动立体视觉的三维人脸重构与识别[J].自动化学报.2009,35(2)：123-131.
    [37]张先武,郭雷.基于子模式双向二维主成分析的人脸识别[J].光电子.激光.2009,20(11)：1498-1502.
    [38]严云洋,郭志波,陈伏兵,杨静宇.融合多尺度多特征的人脸识别方法[J].南京理工大学学报(自然科学版).2009 33(1)：47-52.
    [39]文颖,施鹏飞.一种基于共同向量结合2DPCA的人脸识别方法[J].自动化学报.2009 35(7)：202-205.
    [40]Liu W and Wang Y H.Null space-based kernel Fisher discriminate analysis For face recognition.IEEE Proc.Automatic Face and Gesture Rcognition.2004369-374.
    [41]Jian Yang,David Zhang.A new approach to appearance based face representation and recognition[J].IEEE Transactions on pattern Analysis. And Machine Intelligence,2004,26(1):131-137.
    [42]刘青山,卢汗青.综述人脸识别中的子空间方法[J].自动化学报,2003：29(06)：35-37.
    [43]Ralph Gross. Face Database[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7):711-720, July 2005.
    [44]MarioI.ChaconM, Alejandro Zimmermans. PCNNP:A Pulse-Coupled Neural Network Processor[J]. Proceedings of the IEEE, PP.1581-1584,2002
    [45]Luping Ji, Zhang Yi and Lifeng Shang. An improved pulse coupled neural network for image processing[J]. Neural Comput & Applic (2008) 17:255-263.
    [46]Yide Ma, Zhaobin Wang and Chenhu Wu. Feature Extraction from Noisy Image Using PCNN[C]. International on Information Acquisition. August 20-23.2006. ShangDong, China.
    [47]Mario I,David A. PCNNP:A Pulse-coupled Neural Network Processor[J]. IEEE Transaction On Neural Networks.2008,47(3),:1581-1584.
    [48]Baochang Xu,Zhe Chen.A Multisensor Image Fusion Algorithm Based on PCNN[C].Proceedings of the 5th world congress on Intelligent Control and Automation,June 15-19,2004,Hangzhou,P.R.China.
    [49]kresimir Delarc, Mislav Grgic and Marian Stewart Bartlett. Recent Advance in Face Recognition[M]. Published by In-The. November 2008.
    [50]马义德.脉冲耦合神经网络与数字图像处理[M].科学出版社.2006年1月.
    [51]马义德,崭琨,齐春亮.自适应脉冲耦合神经网络在图像处理中的应用[J].系统仿真学报.2008：20(11)：2897-2900.
    [52]Yide Ma,Chenghu Wu.Feature Extraction from Noisy Image Using PCNN[C].Proceeding of the 2006 IEEE International Conference on Information Acquision August 20-23,2006,Weihai,Shandong,China.
    [53]马义德.基于直方图矢量重心的PCNN图像目标识别新方法[J].电子技术应用.2006.10.27-30.
    [54]Ahmd G.mahgoub, Amira A. Ebeid. An Intersecting Cortical Model Based Framrwork for Human Face Recognition[J]. IEEE Trans. Pattern Anal. Match. Intell,2006,24(6):764-769.
    [55]U Dieckman and T Wagner.SESAM A biometric person identification system using sensor fusion[J].Pattern Recognition Lett.2007,18:827-833
    [56]Souheil Ben-Yacoub and Eddy Mayoraz.Fusion of Face and Speech Data forPerson Identity Verification[J].IEEE Transaction On Neural Networks.Vol 10.No 5.September 1999.
    [57]PeiPei Yin,Fuchun Sun,Chao Wang. An adaptive feature fusion framework for multi-class classification based on SVM[J]. Pattern Recognition Lett.2007, 18:685-691.
    [58]Jain A,Nandakumar K,Ross A.Score normalization in multimodal biometric Systems[J].Pattern Rrcognition.2005,38(12):2270-2285.
    [59]王凤华,韩九强.一种基于虹膜和人脸的多特征融合方法[J].西安交通大学学报,2008：42(02)：45-50.
    [60]Niall A.Robust automatic human identification using face mouth and acoustic information[J].Proceedings of AMFG.2005:264-278.
    [61]Wark T,Sridharan S.Adaptive fusion of speech and lip information for robust speaker identification[J].Digital Signal Processing.2008,11(3):169-186
    [62]Anindya Roy and Sebastien Marcel. Visual Processing Inspired Fern-Audio for Noise-Robust Speaker Verification[R]. Switzerland:Idiap Research Institute,2010.1
    [63]马义德,袁敏等。基于PCNN的语谱图特征提取在说话人识别中的应用[J].计算机工程与应用.2005,20(2)：81-84.
    [64]Rahib Hidayat Abiyev and Koray Altunkays. Neural Network Based Biometric Personal Identification With Iris Recognition [J]. Internation Journal of Control,Automation and Systems.2009,7(1):17-23.
    [65]Su Maojun, Wang Zhaobin, Zhang Hongjuan, Ma Yide. A new method for blood Cell Image Segmentation and Counting Based on PCNN and autowave. ISCCSP 2008, Malta,2008:6-9.
    [66]Giulia Garau and Herv'e Bourlard. USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION[R]. Switzerland:Idiap Research Institute,2010.2
    [67]Sree Hari Krishnan Parthasarathi, Mathew Magimai.-Doss. EVALUATING THE ROBUSTNESS OF PRIVACY-SENSITIVE AUDIO FEATURES FOR SPEECH DETECTION IN PERSONAL AUDIO LOG SCENARIOS[R]. Switzerland:Idiap Research Institute,2010.2
    [68]Anindya Roy, Sebastien Marcel. Visual processing-inspired Fern-Audio features for Noise-Robust Speaker Verification[R]. Switzerland: IdiapResearchInstitute,2010.2
    [69]敦文杰,穆志纯.基于特征融合的人脸人耳多生物身份鉴别[J].天津大学学报,2009,42(7)：636-641.
    [70]许学斌,张德运,张新曼,张铁登,邓万宇.基于特征层和二代曲波变换的多模生物特征融合识别方法[J]西安交通大学学报,2009,43(10)：32-36.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700