基于时间序列的动态唇形身份识别研究

英文题名：Speaker Identification in Time-Sequential Images Based on Movements of Lips
作者：胡颖杰
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：身份识别 ; 唇形 ; 时间序列 ; 隐马尔科夫模型 ; Gabor变换
英文关键词：identification ; mouth shapes ; time-sequential images ; HMM ; Gabor wavelet transform
学位年度：2011
导师：蒙应杰
学科代码：081202
学位授予单位：兰州大学
论文提交日期：2011-04-01

摘要

随着传统安全认证以及网络社会的发展,安全有效的实时性身份认证越来越受到人们的重视,在基于生物特征的实时身份识别技术中,生物标识性动态特征的提取是决定算法识别效果、可行性的关键。
     本文针对现有研究大多基于单个静态图像,未能充分利用唇部变化特征的局限性,以及特征提取中需处理的信息量较大、识别率不高等缺陷,提出基于时间序列的动态唇形身份识别思想,在提取讲话人的特征向量时间序列的基础上,建立反映讲话人特征的隐马尔科夫模型,进而完成身份识别的判定。论文的主要研究工作和成果体现在以下几个方面。
     (1)在研究已有生物特征识别及唇部动态特征提取技术的基础上,构造了一种基于时间序列的动态唇形识别系统,设计了系统的体系结构,系统包括预处理、特征提取以及基于时间序列的身份识别三部分。
     (2)在特征提取部分,设计了基于点模型的内唇特征信息混合提取模型；给出了模型各组成部分的功能关系,设计了模型中基本唇印序列构造、平均模型计算、特征向量序列构造等过程的具体处理算法；通过仿真实验对基于内唇的混合特征提取算法的可行性进行了验证。
     (3)在识别部分,设计了基于时间序列的身份识别模型,利用隐马尔科夫模型来处理时间序列；设计并给出了隐马尔科夫模型状态确定、参数初始化、训练优化以及利用出现概率进行识别等几个关键环节的具体处理算法；通过仿真实验对基于时间序列的身份识别算法的性能进行了验证和对比分析。
With the developments of traditional security authentication and network society, identification, which was effective, safe and real-time, got more and more attention. In the identification based on biological feature, the extraction of biological symbol feature decided the recognition effect..
     Because the traditional feature extraction of lips based on single static image neglected the movement, and the problems of huge information and low recognition rate, we proposed a method of speaker identification in time-sequential images based on movements of lips. This method extracted feature information of each frame in time-sequential of videos, and HMM models of movements of lips of speakers were constructed and trained to identify. We designed the whole system and the main researches were given below:
     (1) Based on the researches of some feature extraction based on viewable biologic feature, we constructed a identification system which used time-sequential images to capture the movement of lips to identify. This system comprised three parts:pretreatment, feature extraction and identification.
     (2) We designed a mixed extraction model with the feature information of internal lips based on feature points, and decreased dimensions of characteristic vector by calculating the similarities. The functional relationships of each part of model and the specific processing methods were given. At last, we verified the feasibility by simulation experiments.
     (3) We designed the identification model based on time-sequence, which built HMM models to generate observed symbol sequence, and did match with the models in database. The details of several key algorithms were given too, and the performances of our algorithms were analyzed by experiments.

引文

[1]. S King, J Frankel, K Livescu,etc. Speech production knowledge in automatic speech recognition[J]. J. Acoust. Soc. Am.2007,2(121):723-742.
    [2]. Bala J,Dejong K, Huang J, et al. Visual routine for eye detection using hybrid genetic architect ures[A]. Proceedings of t he 13th International Conference on Pat tern Recognition [C]. Los Alamitos:IEEE CS Press,1996:606-610.
    [3]. Reinders M J T, Koch R W C, Gerbrands J J. Locating facial feat ures in image sequences using neural net works[A]. Proceedings of t he 2nd International Conference on Automatic Face and Gesture Recognition[C]. Los Alamitos:IEEE CS Press,1996:230-235.
    [4].吴旭宾,汪同庆,李宏友等.一种新的掌纹特征提取方法研究[J].计算机应用研究,2009,1(26)：398-400.
    [5].谢平,周志丰.基于小波变换和信息熵的掌纹特征提取方法[J].计算机系统应用,2008.
    [6].陈静,詹小四.一种改进的指纹图像增强算法[J].人工智能及识别技术,2007(02)：519-520.
    [7]. Yamato J, Ohya. J, Ishii.K. Recognizing human action in time-sequential images using hidden Markov model[A], Computer Vision and Pattern Recognition,1992:379-385.
    [8].张宏林.数字图像模式识别技术及工程实践[M].北京：人民邮电出版社,2003：322-324.
    [9]. F isher L. Facto rized extended Kalman filter [A].Bellingham W. Proceedings of SP IE [C]. San Diego:Univ. of Rochester,1985,119-129.
    [10].H Arof, F Ahmad, Noraisyah Mohamed Shah.Face localization for facial features extraction using a symmetrical filter and linear Hough transform[J]. Artificial Life and Robotics,2008,12:157-160.
    [11].Nieolas EBENO, Aliee CPALIER Pierre-Yves COULON. A Parametric Model for Realistic LIP Segmentation[J].Seventh international Conference on Control automation Robotics And Vision, Singapore,1426-1431.
    [12].邓广宏.基于内嘴唇的口型特征提取和聚类算法的研究[D].中国哈尔滨：哈尔滨工业大学,2006.
    [13].Liu Zhe, Xiao Jianguo. A new Image Segmentation Technique Based on Non-Parametric Mixture Model[C/CD].WiCOM2010:proceedings of the 6th International Conference on Wireless Communications, Networking and Mobile Computing, Chengdu, China, Sept.23-25,2010. IEEE Computer Society.
    [14].P.L. Silsbee, A. C. Bovik. Computer lipreading for improved accuracy in automatic speech recognition [J].IEEE Transactions on Speech and Audio Processing,1996,4(5):337-351.
    [15].Jian Yang,David Zhang,Jing-Yu Yang. Constructing PCA Baseline Algorithms to Reevaluate ICA-Based Face-Recognition Performance[J]. IEEE Transactions on Systems、Man and Cybernetics,2007,8(37):1015-1021.
    [16].Turgay Celik, Hiiseyin Ozkaramanlia, Hasan Demirel. Facial feature extraction using complex dual-tree wavelet transform[J]. Computer Vision and Image Understanding,2008,8(111):229-246.
    [17]. B.Tiddeman, D.Perrett. Prototyping and transforming visemes for animated speech[C].in Proceedings of Computer Animation,Geneva,Switzer-land,2002:248-251.
    [18].Hyun-Chul Kiml,Hyoung-Joon Kiml,Wonjun Hwang2 and so on.Facial Feature Point Extraction using the Adaptive Mean Shape in Active Shape Model[A].Computer Vision/Computer Graphics Collaboration Techniques[C], France:2007.421-429
    [19].A.Katsamanis,G.Papandreou,and P.Maragos, Face active appearance modeling and speech acoustic information to recover articulation[J],IEEE Tr.on Acoustics,Speech,and Lang,2009,3(17):411-422.
    [20].Meng Yingjie, Li Zhaoxia,Hu Yingjie, et al. Speaker identification based on feature mouth shapes [J]. Journal of Information and Computational Science,2009,6:1209-1216.
    [21].王维.基于动态特征唇印的身份识别研究[D].中国兰州：兰州大学图书馆,2008.
    [22].]Cetingul.H.E, Yemez.Y, Engin Erzin, etc. Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading[C]. IEEE Transactions on Image Processing,2006,10.
    [23]. Wiskott L,Fellows J M,N KruK ger,et al.Face Recognition by Elastic Bunch Graph Matching[J].IEEE Trans.on Patern. Anal.Mach.Intell,1997,19:775-779.
    [24].刘春丽,陈树中,韩安奇.隐马尔科大模型及其在面像识别中的应用[J].计算机应用与软件,2004,4(21)：68-70.
    [25].卢邵平Gabor变换的人脸特征提取算法的研究[D].四川：四川大学；2005.
    [26].B.Tiddeman and D.Perrett. Moving Facial Image Transformations Based on Static 2D Prototypes[A].Visualization and Computer Vision 2001(WSCG'2001) [C],Plzen:2001.
    [27].L Rabiner,B JuangAn, Introduction to hidden Markov models[J]. IEEE ASSp Magazine,1986.
    [28].洪文,黄凤岗,苏菡.基于连续隐马尔科夫模型的步态识别[J].应用科技,2005,2(30)：50-52.
    [29].Fan Kehu. The hand shape recognition method research based on Hidden Markov Models (HMM) [Master's thesis] Jilin:Jilin University,2009.
    [30].Meng Yingjie, Hu Yingjie, Zhang Haiyan, et al. Feature mouth shapes extraction based on contour of internal lips[C/CD].WiCOM2010:proceedings of the 6th International Conference on Wireless Communications, Networking and Mobile Computing, Chengdu, China, Sept.23-25,2010. IEEE Computer Society.
    [31].Xudong Xiea, Kin-Man Lam. Elastic shape-texture matching for human face recognition[J]. Pattern Recognition,2008,1(41):396-405.
    [32]. G. Potamianos, H. P. Graf and E. Cosatto. An Image Transform Approach for HMM Based Aotumatic Lipreading [C].//Proceeding of the International Conference on Image Processing, Chicagao, 1998(3):173-177.
    [33].刘世元,吕黎.基于GHSOM网络的时间序列聚类方法[J].计算机工程,2007,3(33)：208-210.
    [34].陈燕龙,钟碧良.基于HMM和微粒群优化算法的表情识别[J].计算机工程,2007,8(34)：190-192.
    [35].Ozlem Kalinli, Michael L. Seltzer, Alex Acero. Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing,2009.
    [36]. Wing-Pong Choia,Siu-Hong Tsea,Kwok-Wai Wong.etc. Simplified Gabor wavelets for human face recognition[J]. Pattern Recognition,2008(41):1186-1199.
    [37]. S Asakawa, N Minematsu, K Hirose. Multi-stream parameterization for structural speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing,2008. ICASSP 2008.
    [38]. Shi Zhang, Ning Xi, Tingting Tang,etc. Directional Preserving Gradient Vector Field for Active Contour Models[C/CD].WiCOM2010:proceedings of the 6th International Conference on Wireless Communications, Networking and Mobile Computing, Chengdu, China, Sept.23-25,2010. IEEE Computer Society.
    [39].伍新科.视频序列中行人行为的分析研究[D].中国北京：国防科技大学,2008.
    [40].JC Wu, YS Chen, IC Chen. An Automatic Approach to Facial Feature Extraction for 3-D Face Modeling[J]. IAENG International Journal of Computer Science,2007.
    [41].Navneet Dalal, Bill Triggs.Histograms of oriented gradients for human detection[C]. Computer Society Conference on Computer Vision and Pattern Recognition,2005,6.
    [42].Niall A. Fox, Ralph Gross, Philip de Chazal.etc. Person identification using automatic integration of speech, lip, and face experts[C]. Proceeding WBMA'03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications,2003.
    [43].Dimitrios Ververidisa,Constantine Kotropoulos. Emotional speech recognition:Resources, features, and methods[J]. Speech Communication,2006,9(48):1162-1181
    [44]. P.Viola and M.Jones.Fast and robust classification using asymmetric AdaBoost and a detector cascade.Neural information processing systems.2001
    [45].Xie Lei,Cai Xiuli,Fu Zhonghua,Zhao Rongchun,Jiang Dongmei,A Robust Hierarchical Lip Tracking Approach for Lipreading and Audio Visual Speech Recognition[J].Proc. Int. Conf. Mach. Learning Cybernetics,2004,6:3620-3624
    [46].Nakata Yasuyuki,Ando Moritoshi.Lipreading Method Using Color Extraction Method and Eigenspace Technique[J]. Syst Comput Jpn,2004,35(3):12-23.
    [47].F Lee, K Kotani, Q Chen etc.Fast video search algorithm for large video database using adjacent pixel intensity difference quantization histogram feature[J]. IJCSNS International Journal of Computer Science and Network Security,2009,9(9):214-220.
    [48].KRL Reddy, GR Babu, L Kishore, etc.Face recognition based on multi scale low resolution feature extraction and single neural network[J]. IJCSNS International Journal of Computer Science and Network Security,2008,6(8):279-283.
    [49].李嵩,刘党辉,沈兰荪.基于Gabor变换的人眼定位方法[J].测控技术,2006,5(25)：27-32.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700