基于单视觉通道唇读系统的研究

英文题名：Research on Lipreading System Based on Visual Channal Only
作者：梁亚玲
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：唇读 ; 特征提取 ; 正交邻域保持投影 ; 局部敏感判别分析 ; 局部二值模式 ; 双树复小波 ; 基于最近邻准则的动态时间规整 ; 唇部灰度能量图
英文关键词：Lipreading ; Feature extraction ; Orthogonal Neighborhood Preserving Projection(ONPP) ; Locality Sensitive Discriminant Analysis (LSDA) ; Local binary pattern (LBP) ; Dual-Tree Complex Wavelet Transform(DT-CWT) ; One-Nearest Neighbor with Dynamic Time Warping (1-NNDTW) ; Lip Gray Energy Image (LGEI)
学位年度：2011
导师：杜明辉
学科代码：081001
学位授予单位：华南理工大学
论文提交日期：2011-04-08
答辩委员会主席：倪江群

摘要

唇读(lipreading/ speechreading)是人工智能,图像处理,模式识别等相关研究领域综合发展所产生的一个新的研究方向,被广泛的应用于噪声环境下提高自动语音的识别率,也用于安防系统的身份认证,远距离语义识别,听觉障碍人士的语言学习,老年人的唇部语义学习及残障人士辅助系统的唇部命令识别等。目前关于唇读的研究集中在将视频通道作为音频通道的一种补充来提高语音的识别率。在真正的高噪声环境下,语音信道的信息量急剧下降,系统的识别率主要取决于视觉通道,研究基于单视觉通道的语义识别就非常重要。目前基于单视觉通道唇读的研究处于较为初级的阶段,研究对象为小词汇量,且识别率相对较低。将词汇量扩大到较大词汇量,提高单视觉通道唇读的识别率是本文的研究目标。
     本文针对单视觉唇读系统中几个关键问题,进行了较为系统,深入及广泛的研究,主要的研究工作及成果包括以下几个方面:
     (1)对国内外的数据库进行了相应的研究,结合本文的研究对象采用哈工大的数据库HITBICAVDatabase作为主库,在该库的基础上选取不同音标的字建立了一个适合本文研究的数据子库database9603。并对该数据库中的每幅图像提取感兴趣区域生成了可直接用于特征提取和识别的数据库。自建了一个小型的双模态唇读数据库,并对自建数据库进行相应的预处理工作。
     (2)针对唇部感兴趣区域的提取问题,提出了基于人脸结构和灰度信息的感兴趣区域提取方法。该方法通过对大量人脸结构的分析发现,人嘴的宽度与双眼的距离相当,因此采用双眼瞳孔来定位唇部的左右边界,并完成对唇部图像的缩放以及水平位置的调整。利用灰度投影检测唇角,定位唇部的垂直位置。该方法提取的图像具有相对固定的参照,能够真实反映唇部的大小和形状信息。对镜头的缩放以及头部的倾斜具有较好的鲁棒性。针对唇部的提取问题,提出了基于LAB空间a分量的唇部提取(分割)方法。通过对色度空间各分量可分离性的研究,通过fisher准则寻找到能够将唇部和非唇部(肤色,牙齿,胡须等)进行有效分割的彩色分量‘a’。该方法可较好的将唇部提取出来,并根据图像特征自动生成阈值,便于唇部提取的自动化。针对基于轮廓的唇部提取,本文提出了基于流形的唇部轮廓提取方法。实验结果表明,本文提出的唇部轮廓提取方法更逼近唇部的真实轮廓图像。文中还将‘a’分量方法与流形的方法结合起来,提取唇部,实验结果表明基于色度和轮廓的方法提取的唇部效果更好。
     (3)对唇部特征表示进行研究。提出了DT-CWT+PCA的唇部特征提取方法,DT-CWT具有近似的平移不变性及良好的方向性,能够较好的提取唇部感兴趣区域的边缘信息及频域信息,且能克服感兴趣区域(ROI)提取过程中存在的位移问题。实验结果表明该特征提取方法提高了识别率。针对DT-CWT+PCA的方法中将DT-CWT的幅值系数重新排列导致丧失数据本身几何信息的缺点,提出了基于DT-CWT+LBP+PCA空频域相结合的特征提取方法。该方法提取的特征既能体现唇部的频域信息和空间域信息,又能反应其局部信息和全局信息,且对位移和旋转具有不变性。实验结果表明基于DT-CWT和LBP的空频域特征提取方法很大程度上提高了唇读的识别率。
     (4)对唇部特征有效降维问题进行研究。提出了基于DCT+ONPP的特征提取方法,正交邻域保持投影(ONPP)在降维的同时保持了数据本身的几何结构信息。实验结果表明该方法能够提高识别率。在基于监督的学习方法,本文提出了采用局部敏感的判别分析方法(LSDA)对唇部图像提取特征。LSDA结合了LDA和LPP两者的优点,充分体现了唇部局部几何特征。实验结果表明与LDA及传统的方法相比,本文方法识别率更高,且该方法的识别率高于非监督的降维方法。
     (5)针对唇读系统中各样本帧数不同的问题,提出了基于唇部灰度能量图的概念,并结合唇部能量图提出相应的特征提取方法。唇部灰度能量图是通过唇部灰度图像的叠加平均得到的,在投影的过程中完成了样本特征维数的归一化。唇部灰度能量图在保留唇部图像本身静态特征的同时也反映了其动态特征,有效去除传统方法中对单帧分别提取特征时各帧特征之间的相关性,大大降低了特征的维数,缩短了识别时间,提高了识别率。唇部灰度能量图的提出,使得基于人脸识别和基于监督的特征提取方法非常容易移植到基于唇部灰度能量图的唇部特征提取上来。基于此本文将DT-CWT+LBP和LDA的特征表示和特征降维方法应用到唇部灰度能量图上来提取特征。实验结果表明传统的特征表示和特征降维方法仍然适用于唇部灰度能量图,且基于能量图的方法比传统方法的识别率高。
As a result of the joint development in artificial intelligence, image processing, pattern recognition and the relative researches, Lip-reading is a new research direction. It has been researched as complement to improve the speech recognition in noise environment, and also been used for speaker identification in security system, for semantics recognition in distance, for the language learning of hearing hard people, for the older people‘s lip movement recognition and as a associate system for the handicap people. Until now most researches still take the lip-reading research as a complementarity for the noise automatic speech recognition system. But in the reality environment, the quality of audio channel is dropped dramatically in noise enviroment, for the hear-hard person the voice channel can not transmit information. So the lip-reading based on the visual channel is very important. The visual only system is in the step stage, it is limited in small vocabulary, and the recognition rate is relatively low. So extend the vocabulary to the middle and big vocabularies, to improve the recognition rate of visual only system is the aim of this paper.
     Some key issues are researched in visual only lip reading system in this paper, the main research works and contributions of the thesis are as follows:
     (1) Do some research on the available database and choose the HITBICAVDatabase as the main database. Choose one words for every pronunciation to build a subdatabase9603 for the research. Some preprocessing is done for the database such as lip location and normolization. The preprocessing make the database can be used to extract features directly. In the same time, a small database for lip-reading is setup. It includes 10 male and 10 female videos which speak the ten numbers of 0 to 9, for each number, they speaking 10 times.
     (2) Though analyze the structure of a lot of people‘s faces, it is found that the width of the lip is a little small than the width of two eyes. So we propose the lip location method based on the face structure and the luminance. Which use the distance of the two pupils as the reference of the borderline of the lip region of interesting. Use the line of the two pupils to adjust the lip to the level and zoom it to the specified size. The proposed ROI segmentation method has invariable reference, so it can reflect the real size and shape of the lips, it is robust for the zoom and the incline of the face. Based on the separability of different components in different color spaces for lip and non-lip. Proposed a lip extraction method based on 'a' component of LAB color space. This method performance very good and it can create the threshold automatically. It is very useful for the automatic of lip-reading. For the contour extraction of lip, a method based on manifold is proposed, and the experimental result show the contour is more similar to the reality. The method based on‘a’component and manifold are also proposed to extract the lip, the experimental result show it is performance better than use only one method.
     (3) For the representation of the feature, the method based on DT-CWT+PCA is proposed. The approximate invariance of DT-CWT make it is very useful to overcome the shift of ROI. The direction choice make it has the good properties to extract the edge of the lip. The experimental results show it is performance better than DCT+PCA. For the rearranged coefficients of DT-CWT lost the geometrical informations and the local informations is very important. The method based on DT-CWT+LBP+PCA is proposed. The hybrid features reflect the frequency domain and space domain characters. It can reflect the group and local properties too. The experimental results show that it is improve the recognition rate greatly.
     (4) For the dimension reduction, the DCT+ONPP method is proposed, ONPP is a method based on manifold which can keep the neighbor geometrical properties of the data and reflect the group properties too.The experimental result show it is a better than DCT+PCA and more suitable for lip-reading. For the supervised method, the DCT+LSDA feature extraction method is proposed .it is a combination of LDA and LPP, the experimental also show it is effective for lip-reading system.
     (5) To solve the problem that different samples have different number of frames. The Lip Gray Energy Image (LGEI) is proposed, which can norm the feature dimension. It keeps the statistic feature and the dynamic feature of the lip sequence. The feature extration method based on LGEI reduces the feature dimension compared with traditional method which extract feature for single frame. The method based on LGEI short the computer time and improved the recognition rate. The concept of LGEI make it is easier to use the method in face recognition to lip-reading.Based on LGEI, DT-CWT+LBP and LDA are used to present the lip and to reduce the feature dimension.the experimental result show that the proposed method improved the recognition rate greatly and performance better.

引文

[1]蓝妮.突破零证据―读唇术‖破译伦敦劫钞大案[J].西江月,2005-04-019:35-37.
    [2]姚鸿勋,高文,王瑞,等.视觉语言-唇读综述[J].电子学报,2001(2):239-246.
    [3]赵燕燕,王丽荣.唇读技术及其最新发展研究概述[J].长春大学学报,2007,Vol 17,NO.5:58-62.
    [4] E. D. Petajan. Automatic lipreading to enhance speech recognition [D]. Ph.D.thesis, University of Illinois at Urbana Champain , 1984.
    [5] B. Dodd and R. Campbell ,editors. Hearing by Eye: The Psychology of Lip-Reading [M] .Lawrence Erlbaum Associates Ltd. , London , 1987.
    [6] H. MacGurk and J. MacDonald,―Hearing lips and seeing voices,‖Nature, vol. 264, 1976:746–748,
    [7] S. Dupont and J. Luettin,―Audio-visual speech modeling for continuous speech recognition,‖IEEE Trans. Multimedia, vol. 2, Sept. 2000:141–151.
    [8] C. Bregler andY.Konig,‘Eigenlips‘for robust speech recognition, in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1994:669–672.
    [9] P. Duchnowski, U. Meier, and A. Waibel,―See me, hear me: Integrating automatic speech recognition and lip-reading,‖in Proc. Int.Conf. Spoken Language Processing, 1994:547–550.
    [10] G. Potamianos, H. P. Graf, and E. Cosatto,―An image transform approach for HMM based automatic lipreading,‖in Proc. Int. Conf.Image Processing, vol. 1, 1998:173–177.
    [11] A. Q. Summerfield. Lip-reading and audio-visual speech perception [J]. Philosophical Transactions of the Royal Society of London, Series B, 355 ,1992:71 - 78.
    [12] Y.Gong. Speech recognition in noisy environments: a survey [J].Speech Communication,1995,16:261 - 291.
    [13] C. Bregler, H. Hild, S. Manke, A. Waibel, Improving connected letter recognition by lipreading, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1993:557–560.
    [14] P. L. Silsbee. Computer lipreading for improved accuracy in automatic speech recognition [D] . Ph. D. Thesis, University of Texas at Austin.1993.
    [15] P.L. Silsbee and A. C. Bovik. Computer lipreading for improved accuracy in automatic speech recognition [J ] . IEEE Transactions on Speech and Audio Processing , 1996 ,4(5):337 - 351.
    [16] E. D. Petajan ,B. J . Bischoff ,D. A. Bodoff and N. M. Brooke. An improved automatic lipreading system to enhance speech recognition [R] .Bell Labs Tech. Report TM 11251 - 871012 - 11. 1987.
    [17] S. Nakamura, H. Ito, and K. Shikano, Stream weight optimization of speech and lip image sequence for audio-visual speech recognition, in Proc. Int. Conf. Spoken Language Processing, vol. 3, 2000:20–23.
    [18] P. Scanlon and R. Reilly,―Feature analysis for automatic speechreading,‖in Proc. Workshop Multimedia Signal Processing, 2001: 625–630.
    [19] R. A. Gopinath,―Maximum likelihood modeling with Gaussian distributions for classification,‖in Proc. Int. Conf. Acoustics, Speech and Signal Processing, 1998:661–664.
    [20] G. J . Wolff , K. V. Prasad ,D. G. Stork &M. Hennecke. Lipreading by neural networks : visual preprocessing ,learning and sensory integration[A] . Proceedings of the Neural Information Processing Systems26 [C] ,Morgan Kaufmann ,1994:1027 - 1034.
    [21] J . R. Movellan. Visual Speech Recognition with Stochastic Networks[M] . In G. Tesauro ,D. Touretzky ,and T. leen , editors ,Advances in Neural Information Processing Systems , volume 7. MIT press , Cambrige ,1995.
    [22] N. Li , S. Dettmer and M. Shah. Visually recognizing using eigense quences [DB/ OL ] . http :/ / www. cs. ucf . edu/～vision/ papers/ shah/97/ NDS97. pdf ,1997.
    [23] G. A. Martin and M. Shah, Lipreading using optical flow [A]. Proc.Nat. Conf. Undergraduate Research [C], March 1992.
    [24] P. Cosi , E. M. Caldognetto , F. Ferrere ,M. Dugatto and K. Vagges. Specker independent bimodal phonetic recognition experiments [A]. Fourth International Conference on Spoken Language, 1996.(ICSLP 96). October 3-6, 1996, Philadelpha, Pennsylvania, USA. 54-57 vol.1.
    [25] A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, Dynamic Bayesian networks for audio-visual speech recognition, EURASIP J. Appl. Signal Process, vol.2002, 1274–1288, Nov. 2002.
    [26] M. S. Gray, J. R. Movellan, and T. J. Sejnowski,Dynamic features for visual speech-reading: A systematic comparison, in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan,and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, vol. 9,751–757
    [27] N. Li, S. Dettmer, and M. Shah, Lipreading using eigensequences, in Proc. Int. Workshop Automatic Face Gesture Recognition, 1995, 30–34.
    [28] J. Luettin and N. A. Thacker, Speechreading using probabilistic models, Computer. Vision Image Understanding, 1997,vol. 65, 163–178.
    [29] G. Chiou and J.-N. Hwang,―Lipreading from color video,‖IEEE Trans. Image Processing, Aug. 1997.,vol. 6 :1192–1195,
    [30] M. E. Hennecke , K. Venkatesh Prasad and David G. Stork. Automatic speech recognition system using acoustic and visual signals [A]. The29th Asilomar Conference on Signals ,Systems and Computers [C] ,Pacific Grove ,CA ,IEEE Computer Society Press ,November 1995,vol.2,1214– 1218.
    [31] M.J. Tomlinson, M.J. Russell, N.M. Brooke, Integrating audio and visual information to provide highly robust speech recognition, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, 821–824.
    [32] D.G.Stork, G.J.Wolfff and E.P.Levine. Neural network lipreading system for improved speech recognition [A]. Proceedings International joint Conference on Neural Networks [C],1992, Volume2, 289-295.
    [33] D. G. Stork , G. J . Wolff and E. P. Levine. Neural network lipreading system for improved speech recognition [A]. Proceedings International Joint Conference on Neural Networks [C], 1992,Volume 2 , 289 - 295.
    [34] M. E. Hennecke ,D. G. Strok , and K. V. Prasad. Visionary Speech :Looking Ahead to Practical Speechreading Systems [M] . In David G.Stork and Marcus E. Hennecke ,editors ,Speechreading by humans and machines ,Springer Verlag,Berlin ,volume 150 of NATOASI Series ,Series F :Computer and Systems Sciences. 1996:331 - 350.
    [35] W. H. Sumby and I. Pollak. Visual contributions to speech intelligibility in noise [J]. Journal of the Acoustical Society of America, 1954, 26:212 - 215.
    [36] Xiaopeng Hong, Hongxun Yao,Yuqi Wan etc. A PCA based Visual Feature Extraction Method for Lip-Reading[C].Proceeding of the 2006 international Conference on intelligentinformation Hiding and Multimedia Signal Processing (IIH-MSP’06):321-326.
    [37]万玉奇,姚鸿勋,洪晓鹏.唇读中基于象素的特征提取方法的研究[J].计算机工程与应用,2007,43(20):197-199.
    [38]何俊,张华,刘继忠.在DCT域进行LDA的唇读特征提取方法[J].计算机工程与应用,2009,45(32):150-155.
    [39]陈蓉,姚鸿勋,洪晓鹏,等.视觉单通道唇读系统的有效性[J] .计算机工程与应用,2007,43(20):28-30.
    [40] Wark,T.,Sridharan,S.,and Chandran, An approach to statistical lip modeling for speaker identification via chromatic feature extraction .In proceedings of the IEEE International Conference on Pattern Recognition, 1998,123-125
    [41] J . Luettin ,N. A. Thacker and S. W. Beet. Speaker identification by lipreading [A]. Proceedings of the 4th International Conference on Spoken Language Processing ( ICSLP’96) [C] ,1996,62-65.
    [42] Maycel-Isaac Faraj and Josef Bigun Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition,IEEE transactions on computers, 2007,56(9): 1169-1175
    [43] GuoBin Ou, Xin Li, XiaoCao Yao, Speaker Identification Using Speech and Lip Features [C] Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005:2565-2570
    [44] H. E. C eting ul, Y. Yemez, E. Erzin and A. M. Tekalp.Robust lip-motion features for sperker identification. ICASSP 2005:509-512
    [45] T. Wark*, D. Thambiratnam and S. Sridharan. Person Authentication using Lip Information 1997 IEEE TENCON-Speech and Image Technologies for Computing and elecommunications :153-156
    [46] H.Ertan Cetingul,Engin Erzin, Yucel Yemez,et,cl. On optimal selection of lip-motion features for speaker identification. 2004 IEEE 6th workshop on multimedia signal processing .:7-10.
    [47] Maycel Isaac Faraj and Josef Bigun,Motion Features from Lip Movement for Person Authentication[C] .The 18th International Conference on Pattern Recognition (ICPR'06)
    [48] H. E. Cetingiil, Yemez, E. Erzin and A. M. Tekulp. Disctiminative lip-motion features forbiometric sperker identification[C], 2004 International Conference on Image Processing (ICIP):2023-2026
    [49] H. Ertan ?etingül , Yücel Yemez, E. Erzinet al. Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading IEEE transactions on image processing, 2006 ,15(10):2879-2891
    [50] P. Jourlin , J . Luettin , D. Genoud and H. Wassner. Acoustic labial speaker verification [J]. Pattern Recognition Letters, 1997, 18(9):853-858.
    [51] Ying Li, Shrikanth S. Narayanan, C. -C. Jay Kuo.Adaptive speaker identification with audiovisual cues for movie content analysis, Pattern Recognition Letters, 2004,25(7): 777-791.
    [52]清泉.最原始的就是最安全的?——唇读学的新兴与困惑[J].探索历程.2006,12:28-31.
    [53] http://amp.ece.cmu.edu/projects/audio visual speechprocessing.
    [54]徐彦君.汉语听觉视觉双模态数据库CAVSR1.0[J].声学学报(中文版),2000,25(1):42-49.
    [55] J . Luettin ,N. A. Thacker. Speechreading using probabilistic models [J ] . Computer Vision and Image Understanding, 1997, l65(2):163 -178.
    [56] E.D. Petajan, Automatic lipreading to enhance speech recognition, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1985, 40–47.
    [57] A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, Dynamic Bayesian networks for audio-visual speech recognition,EURASIP J. Appl. Signal Process.,2002,1274–1288.
    [58] Heckmann M. DCT-based video features for audio-visual speech recognition [C] Proc Int Conf Spoken Lang Process Denver,USA,2002:1925-1928.
    [59] Lewis T W , Powers D M W. Audio visual speech recognition using red exclusion and neural networks[C]. 20th Australasian Computer Science Conference. Melbourne, Victoria, Australia: Monash University, 2002: 149-156.
    [60]张建明,陶宏,王良民,等.基于SVD的唇动视觉语音特征提取技术[J].江苏大学学报自然科学版,2004, 25 (5):426-429.
    [61] Guitarte Pérez J F, FrangiA F, Lleida So lano E, etal. Lip reading for robust speech recognition on embedded devices [C] IEEE International Conference on Acoustics, Speech, and Signal Processing. Philadelphia, PA , USA: IEEE, 2005, 1: 473-476.
    [62] Potamianos G, Neti C. Improved ROI and with in frame discriminant features for lip reading [C] International Conference on Image Processing. Thessaloniki, Greece: IEEE, 2001, 3: 250-253.
    [63] S. Dupont and J. Luettin,―Audio-visual speech modeling for continuous speech recognition,‖IEEE Trans. Multimedia, vol. 2,141–151, Sept. 2000.
    [64] Aleksic P S, Katsaggelos A K. Comparison of low and high level visual features for audio visual continuous automatic speech recognition [C] IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada: IEEE, 2004, 5:917-920.
    [65] Lee K D, Lee M J , Lee S Y. Extraction of frame difference features based on PCA and ICA for lipreading [C] International Joint Conference on Neural Networks. Montreal, Canada: IEEE, 2005: 232-237.
    [66] Hazen T J. Visual model structures and synchrony constraints for audio visual speech recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14 (3) : 1082-1089
    [67] Matthews I, Potamianos G, Neti C, et al. A comparison of model and transform- based visual feature for audio - visual LVCSR [C].IEEE International Conference on Multimedia and Expo. Tokyo, Japan IEEE, 2001: 1032-1036
    [68] B. P. Yuhas ,M. H. Goldstein and T. J . Sejnowski . Integration of acoustic and visual speech signals using neural nets [J ] . IEEE Communication Magazine, November 1989:65 - 71.
    [69] K.Mase and A. Pentland. Automatic lipreading by optical flow analysis[R]. Technical Report 117, MIT Media lab,1991.
    [70] Mihaela Gordan, Constantine Kotropoulos, Ioannis Pitas, Application of support vector machines classifiers to visual speech recognition.[C] .ICIP 2002:129-132
    [71] He jun, Zhang Hua.Lipreading Recognition Based on SVM and DTAK. 2010 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE), 2010:1-4.
    [72] Chen Tsubun. Audiovisual speech processing[J],IEEE signal Processing magazine 2001:9-21
    [73] K.Mase and A. Pentland. Automatic lipreading by optical flow analysis[R]. Technical Report 117, MIT Media lab ,1991.
    [74] T. F. Cootes ,G.J. Edwards and C.J. Taylor. Active appearance models [A]. Proc. European Conference on Computer Vision [C], June 1998:484 - 498.
    [75] T. F. Cootes and C. J. Taylor. A mixture model for representing shape variation [J]. Image and Vision Computing, 1999,17 (8):567 - 574.
    [76] T. F. Cootes, C. Beeston, G. J . Edwards and C. J. Taylor. A unified framework for Atlas matching using active appearance models [A] .Proc. Int. Conf. on Image Processing in Medical Imaging [C] ,(Springer ,LNCS 1613) 1999 :322 - 333.
    [77] G. Potamianos, G.Gravier and A. Gray. Recent Advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 2003, vol.91,No .9:1306-1323
    [78] J.Luettin, N. A. Thacker. Speech reading using probabilistic models [J]. Computer Vision and Image Understanding, 1997, l65 (2):163 -178.
    [79] C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri,J. Sison, A. Mashari, and J. Zhou, Audio-visual speech recognition, Center Lang. Speech Process., Johns Hopkins Univ., Baltimore,MD, 2000.
    [80] Kaoru Swkiyama, Yoh‘ichi Tohkura and Michio Umeda. A few factors which affect the degree of incorporating lip-read information into speech perception. The Fourth International Conference on Spoken Language, 1996. (ICSLP 96).Proceedings,1996, vol.3, 1481-1484.
    [81]周小玲.让计算机读―唇语‖[J].世界科学,2003.11:40-41.
    [82] Guoying Zhao, Mark Barnard and Matti Pietikainen. Lipreading with Local Spatiotemporal Descriptors.IEEE transactions on multimedia, November 2009, 11(7):1254-1265.
    [83]可读唇语手机技术与市场, 2004,01:27.
    [84]单卫,姚鸿勋,高文.唇读中序列口型的分类[J].中文信息学报, Vol.16,No.1:31-36.
    [85]姚鸿勋,高文,李静梅,等.用于口型识别的实时唇定位方法[J].软件学报,2000,11(8):1126– 1132
    [86]姚鸿勋,刘明宝,高文,等.基于彩色图像的色系坐标变换的面部定位与跟踪法[J].计算机学报,2000,23(2):158– 165.
    [87]柴秀娟,姚鸿勋高文,等.唇读识别中的基本口型分类[J] .计算机科学,2002,29(2): 130-133.
    [88]姚鸿勋,吕雅娟,高文.基于色度分析的唇动特征提取[J].电子学报, 2002,30(2):168-172.
    [89]徐铭辉,姚鸿勋.基于句子级的唇语识别技术[J] .计算机工程与应用,2005.8:85-87.
    [90]徐彦君.中文双语料语音识别关键技术研究[D] .博士论文.北京:中科院语音所,1998.
    [91]赵晖,林成龙,唐朝京.基于视频三音子的双模态语料自动选取算法[J].计算机工程, 2009, 35(17): 1-3.
    [92]雷江华.听障学生唇读汉字语音识别的实验研究.博士论文.上海:华东师范大学,2006.
    [93]雷江华,熊琪,张军华,等.听障学生唇读语音识别编码方式的实验研究[J].中国特殊教育[J],2007, 85(7):28-31.
    [94]雷江华.视素可见性在听障学生唇读汉语元音识别中的作用[J].中国特殊教育[J],2008,93(3):17-20.
    [95]李刚,王蒙军,林凌.面向残疾人的汉语可视语音数据库[J].中国生物医学工程学报[J],2007,26(3):355-360.
    [96]李刚,王蒙军,林凌,等.视觉驱动的语音合成系统中唇形轮廓的正交变换描述[J].光学精密工程, 2007,15(7):1117-1123.
    [97]奉小慧,王伟凝,吴绪镇,等.基于多色彩空间的自适应嘴唇区域定位算法[J].计算机应用,2009,29(7):1924-1927.
    [98]奉小慧,贺前华,王伟凝,等.基于PS LevelSet的嘴唇几何形状定位模型[J].华南理工大学学报(自然科学版). 2010,38(2):121-125.
    [99]严乐贫,奉小惠.双模态车载语音控制仿真系统的设计与实现[J].计算机与现代化,2010,8:211-215
    [100]严乐贫.音视频双模态车载语音控制系统的设计与实现[D],广州,华南理工大学,2010.
    [101]王良民,张建明,牛德姣,等.实时视频图像快速唇部检测方法的研究与实现[J].计算机应用,2004,24(1):70-72.
    [102]吕国云,赵荣椿,蒋冬梅,等.基于BTSM和DBN模型的唇读和视素切分研究[J].计算机工程与应用,2007, 43(14):21-24.
    [103]王晓平,郝玉峰,付德刚,等.计算机唇读研究进展.数据采集与处理[J], 2007年9月,Vol. 22 No. 3:353-359.
    [104]张海波.基于DHMM的视觉语言识别[D].吉林大学, 2010.
    [105] Movellan J. R. (1995) Visual Speech Recognition with Stochastic Networks. in G. Tesauro, D. Toruetzky, & T. Leen (eds.) Advances in Neural Information Processing Systems, Vol 7, MIT Pess, Cambridge.
    [106] K.Messer, J.Matas, J.Kitter, et al. Xm2Vtsdb:The extended m2vt database[C].In 2th international Conference on audio and video-based Bimetric person authentication. 1999.
    [107] http://www.ee.surrey.ac.uk/CVSSP/xm2vtsdb/
    [108] M atthews et al.Extraction of visual features for lipreading [J]. IEEETransaction on Pattern Analysis and M achine Intelligence,2002,24(2):198-213.
    [109] http://chenlab.ece.cornell.edu/projects/AudioVisualSpeechProcessing/
    [110]王东,蒙山,张有为.汉语听觉视觉语音识别CAVSR双模态数据库的建立与结构[J].五邑大学学报自然科学版, 2001, 15(1):50-54.
    [111]洪晓鹏,姚鸿勋,徐铭辉.基于句子级的唇读语料库及其切分算法[J].计算机工程与应用, 2005,(3):174-177.
    [112]赵晖,林成龙,唐朝京,等.基于视频三音子的汉语双模态语料库Bi-VSSDatabase[J].中文信息学报,2009, 23(5):98-103.
    [113]张志文,沈海斌.基于色度分布差异性的唇部检测算法[J].浙江大学学报(工学版), 2008,42:1355-1359
    [114] Gerasimos Potamianos, Chalapathy Neti, Giridharan Iyengar, Andrew W. Senior and Ashish Verma, A cascade visual front end for speaker independent automatic speech reading. International Journal of Speech Technology, 2001, Volume 4, Numbers 3-4, 193-208.
    [115]王丹.唇读的静动态特征表示方法研究.硕士学位论文,哈尔滨工业大学,2008.
    [116] Yao Wenjuan, Liang Ya-ling, Du Ming-hui, A Real-time Lip Localization and Tacking for Lip Reading [C]. 2010 the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010), August 20-22, 2010, Chengdu, China.363-366.
    [117] Steifelhagen,R.,Yang,J.,and Meier,U. Rea-time lip tracking for lipreading. In Proceedings of Eurospeech‘97.
    [118] Wark,T.,Sridharan,S.,and Chandran. An approach to statistical lip modelling for speaker identification via chromatic feature extraction. In proceedings of the IEEE InternationalConference on Pattern Recognition, 1998:123-125
    [119] Coianiz,T., Torresani,L.,and Massaro. 2D deformable models for visual speech analysis. in Speechreading by Humans and Machines,Springer-Verlag, 1995,proceedings of the NATO Advanced Study Institute..391-398.
    [120] Alan Wee-Chung Liew,Shu Hung Leung, and Wing Hong Lau, Segmentation of Color Lip Images by Spatial Fuzzy Clustering ,IEEE transactions on fuzzy systems,2003,11(4): 542-549.
    [121] Trent W.Lewis and David M.W.Powers, Lip Feature Extraction Using Red Exculsion[A]. Proceedings Selected papers from Pan-Sydney Workshop on Visual Information Processing[C]. Sydney, Australia, December, 2000:61-67.
    [122] Jian-ming Zhang, Liangmin Wang, Dejiao Niu, Yongzhao Zhan. Research and Implementation of a Real Time Approch to Lip Detection in Video Sequences, Proceedmgs of the Second International Conference on Machine Learmg and Cybernencs, X‘an, 2-5 November 2003:2795-2799
    [123] Qing-Cai Chen,Guang-Hong Deng, Xiao-long Wang, HeJiao Huang, An Inner Contour Based Lip Moving Feature Extration Method for Chinese Speech. Proceeding of the Fifth International Conference on Machine Learning and Cybernetics,Dalian,13-16 August 2006:3859-3864
    [124]张志文.唇部检测算法的研究与实现.浙江大学硕士学位论文,2007年5月.
    [125]何俊,张华.一种唇读嘴唇的实时检测方法.Proceedings of the 26th Chinese Control Conference. July 26-31, 2007:516-521.
    [126] N. Eveno, A. Caplier, P. Y. Codon. A New Color Transformation for Lips Segmentation [J]. IEEE Workshop on Multimedia Signal Processing. Cannes: 2001 (10): 3-5
    [127] R. Kaucic, A. Blake, Accurate, real-time unadorned lip tracking, Proceedings of the 6th International Conference on Computer Vision, 1998:370–375.
    [128]梁亚玲,杜明辉.基于Lab空间a分量的唇部提取方法[J].计算机工程, 2011年, 37(3) : 19-21.
    [129] Min C. Shin,Kyong I. Chang and Leonid V. Tsap. Does Colorspace Transformation Make Any Difference on Skin Detection? Proceeding of the Sixth IEEE workshop on Applicationof Computer vision (WACV‘2002).:275-279.
    [130] http://www.863data.org.cn/br_chnlisten.php.
    [131] M. Kass ,A. Witkin & D. Terzopoulus. Snakes: active contour models[J]. International Journal of Computer Vision, 1988:321 - 331.
    [132] T.A. Faruquie, A. Majumdar, N. Rajput, L.V.Subramaniam, Large vocabulary audio-visual speech recognition using active shape models, Proc. of IEEE Int. Conf. on Pattern Recognition, vol.3,Barcelona, 2000:106-109.
    [133] J. Luettin, N.A. Thacker, S.W. Beet, Visual speech recognition using active shape models and hidden markov models, Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing, 1996:817–820.
    [134] A.W.C. Liew, S.H. Leung, W.H. Lau, Region-based approach to robust lip contour extraction, IEE Electron. Lett. 36 (15)(2000) 1272–1274.
    [135] G. I. Chiou and J . N. Hwang. Lipreading by using snakes, principal component analysis and hidden Markov models to recognize color motion video [J]. IEEE Trans. on Image Processing, 1997 ,6 (8):1192 -1195.
    [136] David G. Stork and Marcus E. Hennecke, editors. Speechreading by Humans and Machines: Models,Methods ,and Applications [M]. volume 150 of NATO2ASI Series , Series F : Computer and Systems Sciences ,Berlin ,Springer2Verlag ,1996.
    [137] Juergen Luettin, Neil A. Thacker ,Steve W. Beet . Visual speech recognition using active shape models and hidden markov models [R].University of Sheffield, Electronic Systems Group Report No. 95/ 47, 1996.
    [138] S. Horbelt, J.Dugelay. Active contours for lipreading combining snakes with templates [A]. 15th GRETSI Symposiumon Signal and Image Processing [C], Juan Les Pins , France , 18 - 22 Sept, 1995.
    [139] Xin Liu, Yiu-ming Cheung, Meng Li ,et al. A Lip Contour Extraction Method Using Localized Active Contour Model with Automatic Parameter Selection. 2010 International Conference on Pattern Recognition.2010:4332-4335.
    [140] R.R. Rao, R.M. Mersereau, Lip modeling for visual speech recognition, Proceedings of the 28th Asilomar Conference on Signals, Systems and Computers, 1995:587-590.
    [141] M.E. Hennecke, K.V. Prasad, D.G. Stork, Using deformable templates to infer visual speech dynamics, proceedings ofthe 28th Asilomar Conference on Signals, Systems and Computers, 1995:578–582.
    [142] Alan Wee-Chung Liew; Shu Hung Leung, Wing Hong Lau.Lip contour extraction from color images using a deformable model,Pattern Recognition 35 (2002):2949–2962.
    [143] Nicolas EVENO, Alice CaPLIFR, Pierre_Yves COULON, Key points based segmentation of lips. IEEE Int. Conf. Multimed. Expo 2, 125-128 (2002).
    [144] NirSochen,Ron Kimmeland Ravikanth Malladi, A General Framework for Low Level Vision, IEEE Transactions on Image Processing Vol.7, No.3. March 1998:310-318
    [145] Liang Yaling, Du Minghui. Lip Contour Extraction Based on Manifold.2008 International Conference on Multimedia and Information Technology MMIT 2008:229-232
    [146] Gerasimos Potamianos, Chalapathy Neti, Guillaume Gravier, etc, Recent advance in the automatic recognition of audio-visual speech[C]. Proceedings of the IEEE, 2003, 91(9):1306-1326.
    [147]张盛平.唇读中的特征提取、选择与融合.哈尔滨工业大学,2008年硕士论文.
    [148] Tenenbaum J B,ilva V de, and Langfor J.C., A Global Geometic Framework for Nonliner Dimensionality Reduction [J],Science, Dec.2000,vol.290(5500):2319-2323
    [149] Roweis S T, Saul L K, Nonlinear dimensionality reduction by locally linear embedding, Science 290(2000):2323-2326.
    [150] E.Kokiopoulou, Y.Saad, Orthogonal Neighborhood Preserving Projections[C]. Proceeding of the Fifth IEEE International Conference on Data Minging (ICDM’05)2005:1-8
    [151]梁亚玲,杜明辉.基于DCT和ONPP的唇部特征提取[J].计算机科学,2011,38(5):261-264.
    [152] Deng Cai, Xiaofei He, Kun Zhou et al, Locality Sensitive Discriminant Analysis[C]. International Joint Conference on Artificial Itelligence, 2007:708-713.
    [153] N G Kingsbury, The dual-tree complex wavelet transform: a new technique for shift invariance and directional filters, IEEE Digital Signal Processing Workshop, DSP 98, Bryce Canyon, August 1998.
    [154] N G Kingsbury: Shift invariant properties of the Dual-Tree Complex Wavelet Transform, Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Phoenix, AZ, paper SPTM 3.6, March 16-19, 1999:1221-1224.
    [155] Ivan W. Selesnick, Richard G. Baraniuk, and Nick G. Kingsbury. The Dual-Tree Complex Wavelet Transform,IEEE signal processing magazine, novemember 2005:123-151.
    [156]梁亚玲,杜明辉.基于DT-CWT和PCA的唇部特征提取方法[J].电视技术,2011, 35(3):93-96.
    [157] Timo Ojala,Matti Pietikaèinen,Multiresolution Gray-Scale and Rotation Invariant Texture classification with Local Binary Patterns[J]. IEEE Transactions on pattern analysis and machine intelligence,2002,24(7):971-987
    [158]梁亚玲,杜明辉.基于DT-CWT和LBP的唇部特征提取方法[C].第十五届全国图像图形学学术会议论文集.2010.12,中国,广州,226-230.
    [159]边肇祺,张学工.模式识别(第二版),北京:清华大学出版社,2000:136-136
    [160]易克初.语音信号处理.北京:国防工业出版杜,2001.
    [161] Xi, X. Keogh, E. Wei, L. Fast Time series classification using numerosity reduction, Proceedings of 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006:1033-1040
    [162] Ju Han, Bir Bhanu. Individual recognition using gait energy image [J]. IEEE transactions on pattern analysis and machine intelligence, 2006,28(2): 316-322.
    [163]韦素媛,马天骏,宁超,等.用时空能量图和小波变换方法表征和识别步态[J].电子科技大学学报, 2009,38(3):431-434.
    [164] Jianyi Liu; Nanning Zheng Gait History Image: A Novel Temporal Template for Gait Recognition. 2007 IEEE International Conference on Multimedia and Expo: 663– 666.
    [165] Md. Atiqur Rahman Ahad, T. Ogata, J. K. Tan, H. S. Kim et al,Comparative Analysis Between Two View-based Methods: MHI and DMHI[C]. 10th international conference on Computer and information technology, 2007. ICCIT 2007.:1-7
    [166] Liao, T; Chung-Lin Huang; Slip and Fall Events Detection by Analyzing the Integrated Spatiotemporal Energy Map Pattern Recognition, 2010 20th International Conference on (ICPR): 1718– 1721.
    [167] Kusakunniran,W.; Qiang Wu; Hongdong Li; Jian Zhang; Multiple views gait recognition using View Transformation Model based on optimized Gait Energy Image. 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops): 1058– 1064.
    [168] Bashir, K.; Tao Xiang; Shaogang Gong; Feature selection on Gait Energy Image for human identification. ICASSP 2008: 985– 988.
    [169] Han, J; Bhanu, B.;Individual recognition using gait energy image,IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(2): 316– 322.
    [170] Qinyong Ma; Shenkang Wang; Dongdong Nie; Jianfeng Qiu; Gait Recognition at a Distance Based on Energy Deviation Image , The 1st International Conference on Bioinformatics and Biomedical Engineering, ICBBE 2007: 621 - 624
    [171] Xiaochao Yang; Tianhao Zhang; Yue Zhou; Jie Yang; Gabor phase embedding of Gait Energy Image for identity recognition 8th IEEE International Conference on Computer and Information Technology 2008. CIT 2008:361 - 366
    [172] Lin Chunli; Wang Kejun; A behavior classification based on Enhanced Gait Energy Image 2010 2nd International Conference on Networking and Digital Society (ICNDS): 589 - 592
    [173] A. F. Bobick and J. W. Davis, The Recognition of Human Movement Using Temporal Templates, IEEE Trans. on PAMI, 2001, 23(3):257–267.
    [174] Yi-Ching Liaw; Wei-Chih Chen; Tsung-Jen Huang; Video Objects Behavior Recognition Using Fast MHI Approach,Computer Graphics, Imaging and Visualization (CGIV), 2010 Seventh International Conference on: 129– 133.
    [175] Murayama, H.; Yamada, K.; Detection of unusual human activity based on sequence of actions with MHI and CDP. TENCON 2010 - 2010 IEEE Region 10 Conference: 1663 - 1667
    [176] Ahad, A.R.; Ogata, T.; Tan, J.K.; Kim, H.S.; Ishikawa, S.; Performance of Multi-directional MHI for Human Motion Recognition in the Presence of Outliers. Industrial Electronics Society, 2007. IECON 2007. 33rd Annual Conference of the IEEE : 2366– 2370.
    [177] Zhaozheng Yin; Collins, R.; Moving Object Localization in Thermal Imagery by Forward-backward MHI,Computer Vision and Pattern Recognition Workshop, 2006. CVPRW '06. Conference on : 133-136.
    [178] Watanabe, K.; Kurita, T.; Motion Recognition by Higher Order Local Auto Correlation Features of Motion History Images.Bio-inspired Learning and Intelligent Systems for Security, 2008. BLISS '08. ECSIS Symposium on : 51 - 55
    [179] Murakami, M.; Joo Kooi Tan; Hyoungseop Kim; Ishikawa, S. Human motion recognition using directional motion history images ,SICE Annual Conference 2010, Proceedings of : 1512– 1514.
    [180] Ping Guo,Zhenjiang Miao. Motion description with local binary pattern and motion history image: Application to human motion recognition. IEEE International Workshop on HAVE 2008: 171– 174.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700