唇读发声器中视觉信息的检测与处理

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

唇读发声器中视觉信息的检测与处理

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Detecting and Processing Visual Information in Speech Synthesis System Driven by Visual-speech
作者：王蒙军
论文级别：博士
学科专业名称：生物医学工程
中文关键词：康复工程 ; 唇形识别 ; 非对称轮廓模型 ; 运动检测 ; 正交变换 ; 特征融合 ; 隐马尔可夫模型
英文关键词：rehabilitation ; speechreading ; unsymmetrical lip contour model ; movement detection ; orthogonal transforms ; feature fusion ; Hidden Markov Model
学位年度：2007
导师：李刚
学科代码：0831
学位授予单位：天津大学
论文提交日期：2007-12-01

摘要

为恢复那些后天致残、但仍然具有正确唇形特征语言残障者的语音表达能力,探索建立一个基于视觉信息的唇形识别发声系统,本研究把从嘴唇图像序列中提取的视觉信息作为一种特殊语言加以分析识别。在研究中,对视觉信息检测与处理中的一些基本问题,如视觉信息与语音信息的对应关系,嘴唇区域和唇形轮廓所包含的信息量,最佳唇读系统特征向量的选取,以及自动有效地提取与识别唇形特征的方法进行了深入分析。
     论文的主要研究内容包括:
     1.通过分析正面和侧面视角下人脸图像的特点,建立一种新的非对称唇形轮廓描述模型,其中既包含嘴唇高度、宽度等信息,又包含嘴唇突出度信息,同时计算各个参数对时间的导数,来获得唇形轮廓的动态信息,通过组合不同的特征参数,分析参数选择对识别效果的影响,基于独立汉字发音的实验表明,该模型能够将识别效果平均提高25%以上。并且据此模型设计建立了基于常用汉字、面向残疾人的汉语双模语音数据库。
     2.基于运动检测和数学形态学方法对唇动序列的灰度图像进行处理,得到唇形区域和唇形轮廓,然后从唇形区域提取嘴唇宽度的投影W ,外唇轮廓的高度H ,嘴唇突出度的投影信息F ,并且考察它们对时间的导数关系,得到dW /dt , dH /dt , dF /dt等差分特征参数,组合形成6维几何特征向量。
     3.利用离散傅里叶变换(DFT)和离散余弦变换(DCT)分别得到描述唇形轮廓特征的傅里叶描述子和离散余弦变换描述子,然后将两类描述子作为唇形轮廓的特征向量,采用隐马尔可夫模型(HMM)进行学习和识别,分析了两类描述子刻画唇形轮廓特征的能力。
     4.采用特征融合技术提高特征向量分类识别能力,用串联加权组合的方法,将唇形区域几何特征向量和由离散余弦变换描述子表述的唇形轮廓特征向量融合形成新的特征向量,应用HMM对其进行学习和识别,分析不同加权因子下的识别效果。
     5.选用二阶HMM来对唇形特征参数序列进行学习和识别,利用了各帧唇形特征向量之间的上下文相关性,更符合汉语发音方式,通过实验分析比较了一阶HMM和二阶HMM对相同特征向量的识别能力。
In order to develop a communication approach for voice-impaired people, a speech synthesis system driven by visual speech is approached. The visual information of lip-movement from the mouth region is used as a special language in this system. In this research, some fundamental problems are explored, such as how to correlate the visual information with sound information, how much information can be extracted from the lip region and lip contours, how much the parameters of the lip features can contribute to a robust speechreading system, and what is the effective proceeding to extract lip parameters automatically.
     The main research content of the dissertation involves:
     1. Based on analyzing the frontal-view face image and profile-view face image, a new model, which can extract the degree of pouting from it, is presented. At the same time, the differential coefficient of some parameters to describe dynamic characteristic of the lip contour are calculated. Experimental results based on a small database of Chinese words show that the parameters from unsymmetrical lip contour model improved the recognizing rate in more than 25%. Then using this model, a mandarin Chinese visual-speech database is designed for voice-impaired people.
     2. Movement detection and morphological processing are used to extract mouth area and lip contours from the image sequences. Then the lip features is extracted from the mouth region; including the projection of the width of the outer lip contourW , the height of the outer lip contourH , and the projection of the poutingF . The difference of these parameters are calculated as new parameters to describe the dynamic information of the lip, including dW /dt , dH /dt and dF /dt .
     3. Discrete Fourier Transform and Discrete Cosine Transform are used to get the descriptors of lip contours in the unsymmetrical lip contours model automatically. Hidden Markov Model is trained by using both of the descriptors as the eigenvector of lip contours, and then recognition ability is tested.
     4. Feature fusion is used to improve the classifiable power. To get better effect of combination, weighting combination is used to form the parts of with balance. Geometrical features of lip region and the descriptors of lip contours by Discrete Cosine Transform are combined to get a new discriminate vector. With this new vector, the HMM model is used to training and recognizing. The recognition rate is analyzed with different weighting factors.
     5. Second-order Hidden Markov Model is used and implemented to train and test the lip’s feature sequences, which can capture more context information from the lip’s feature sequences, and it fits for the pronunciation of Chinese. The accuracy of recognition rates by both second-order Hidden Markov Model and first-order Hidden Markov Model are tested with the same lip’s feature sequences.

引文

[1]全国各类残疾人统计数据, 2007, http://www.cdpf.org.cn/shiye/sj-000a.htm.
    [2] Meltzner G S, Kobler J B, Hillman R E, et.al, Design and implementation of a hands-free electrolarynx device controlled by neck strap muscle electromyographic activity, Proceedings of First International IEEE EMBS Conference on Neural Engineering, 2003, Vol.77 (3): 20-22.
    [3]李采,周梁,蒋家琪,电子喉研究进展[J].国外医学耳鼻喉科学分册, 2005, Vol.29 (5): 295-297.
    [4] Stafford F W, Current indications and complications of tracheoesophageal puncture for voice restoration after laryngectomy. Curr Opin Otolaryngol Head Neck Sur, 2003, Vol.11, Pages: 89-95.
    [5] Bunting G W, Voice following laryngeal cancer surgery: Troubleshooting common problems after tracheoesophageal voice restoration. Otolaryngologic clinics of North America, 2004, Vol.37 (3): 597-612.
    [6] Sumby W H, Pollack I, Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 1954, Vol.26, Pages: 212-215.
    [7] Aronson E, Rosenblum S, Space perception in early infancy: Perception within a common auditory visual space. Science, 1971, Vol.172, Pages: 1161-1163.
    [8] Summerfield A Q, Lipreading and Audio-Visual speech perception. Philosophical Transactions of Royal Society of London, 1992, Series B, Vol.355, Pages: 71-78.
    [9] Summerfield A Q, Some Preliminaries to a Comprehensive Account of Audio-Visual Speech Percrption, Hearing by Eye: The Psychology of Lip-reading. Lawrence Erlbaum Associates, 1987, Pages: 97-11.
    [10] Petajan E D, Bischoff B, Brooke N M, An improveded automatic lipreading system to enhance speech recognition. Proceedings of the SIGCHI conference on Human factors in computing systems, 1988, Pages: 19-25.
    [11] Silsbee P L, Bovik A C, Computer lipreading for improved accuracy in automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 1996, Vol.4 (5): 337-351.
    [12] Luettin J, Dupont S, Continuous Audio-Visual Speech Recognition. Proceedings of 5th European Conference on Computer Vision, 1998, Vol.2, Pages: 657-673.
    [13] Potamianos G, Verma A, Neti C, et.al, A cascade image transform for speaker independent automatic speechreading. Proceedings of International Conference on Multimedia and Expo, 2000, Vol.2, Pages: 1097-1100.
    [14] Theobald B J, Bangham J A, Matthews I A, et.al, Towards video realistic synthetic visual speech. Proceedings of IEEE International Conference onAcoustics, Speech, and Signal Processing, 2002, Vol.4, Pages: 3892-3895.
    [15] Li G, Xie G M, Lin L, Real-time speech synthesis system driven by visual speech. Proceedings of 3rd International Symposium on Instrumentation Science and Technology, 2004, Vol.2, Pages: 397-402.
    [16] Yuhas B P, Goldstein Jr M H, Sejnowski T J, et.al, Neural network models of sensory integration for improved vowel recognition. Proceedings of IEEE, 1990, Vol.78 (10): 1658-1668.
    [17] Vatikiotis B E, Eigsti L M, Yano S, et.al, Eye movement of perceivers during audiovisual speech perception. Perception Psychophys, 1998, Vol.60 (6): 926-940.
    [18] Goldschen A J, Garcia O N, Petajan E, Continuous optical automatic speech recognition by lip-reading. Proceedings of 28th Annual Asilomar Conference on Signal Systems, and Computer, 1994, Vol.1, Pages: 572-577.
    [19] Gray M S, Movellan J R, Sejnowski T J, Dynamic features for visual speechreading: A Systematic Comparison. Advances in Neural Information Processing Systems, 1997, Vol.9, Pages: 751-757.
    [20] Barbosa A V, Yehia H C, Measuring the relation between speech acoustics and 2D facial motion. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001, Vol.1, Pages: 181-184.
    [21] Iwano K, Tamura S, Furui S, Bimodal speech recognition using lip movement measured by optical-flow analysis. Proceedings of International Workshop on Hands-Free Speech Communication, 2001, Pages: 187-190.
    [22] Petajan E D, Automatic Lipreading to Enhance Speech Recognition. Ph.D thesis of Universityof Illinois at Urbana_champain, 1984, Pages: 1-261.
    [23] Neti C, Potamianos G, Luettin J, et.al, Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop. IEEE Fourth Workshop on Multimedia Signal Processing, 2001, Pages: 619-624.
    [24] Potamianos G, Graf H P, Linear discriminant analysis for speechreading. Proceedings of IEEE Second Workshop on Multimedia Signal Processing, 1998, Pages: 221-226.
    [25] Potamianos G, Luettin J, Neti C, Hierarchical discriminant features for audio-visual LVCSR. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001, Vol.1, Pages: 165-168.
    [26] Potamianos G, Neti C, Improved ROI and within frame discriminant features for lipreading. Proceedings of International Conference on Image Processing, 2001, Vol.3, Pages: 250-253.
    [27] Luettin J, Thacker N A, Beet S W, Visual speech recognition using active shape model and Hidden Markov Model. University of Sheffield, Electronic SystemsGroup Report No.95/47, 1996.
    [28] Faruquie T A, Majumdar A, Rajput N, et.al, Large vocabulary audio-visual speech recognition using active shape models. Proceedings of 15th International Conference on Pattern Recognition, 2000, Vol.3, Pages: 106-109.
    [29] Caplier A, Lip detection and tracking. Proceedings of 11th International Conference on Image Analysis and Processing, 2001, Pages: 8-13.
    [30]谢磊,冯伟,赵荣椿,一种基于MASM的口形轮廓特征提取方法及听觉视觉语音识别.西北工业大学学报,2004, Vol.22(5): 674-678.
    [31] Seguier R, Cladel N, Multiobjectives genetic snakes: application on audio-visual speech recognition. Proceedings of Fourth EURASIP Conference focused on Video/Image Processing and Multimedia Communications, 2003, Vol.2, Pages: 625-630.
    [32]Yuille A L, Hallinman P W, Coken D S, Feature extraction from faces using deformable templates. International Journal of Computer Vision, 1992, Vol.8 (2): 99-111.
    [33] Chen C W, Huang C L, Human face recognition from a single front view. International Journal of Pattern Recognition and Artificial Intelligence, 1992, Vol.6, Pages: 571-593.
    [34]王磊,莫玉龙,戚飞虎,基于弹性模板的嘴巴轮廓提取,上海大学学报,1998, Vol.4(5): 579-585.
    [35]姚鸿勋,高文,李静梅等,用于口型识别的实时唇定位方法,软件学报,2000,Vol.11(8): 1126-1132.
    [36] Zhang X, Mersereau R M, Clements M, et.al, Visual speech feature extraction for improved speech recognition, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Vol.2, Pages: 1993-1996.
    [37]邹益民,汪渤,一种基于椭圆变形模板的唇动跟踪算法,仪器仪表学报,2007, Vol.28(3): 514-518.
    [38] Chiou G I, Hwang J N, Lipreading from color motion video. IEEE Transactions on Image Processing, 1997, Vol.6 (8): 1192-1195.
    [39] Cootes T F, Edwards G J, Taylor C J, Active appearance models. Proceedings of European Conference on Computer Vision, 1998, Pages: 484-498.
    [40] Cootes T F, Walker K N, Taylor C J, View-based active appearance models. Proceedings of International Conference on Face and Gesture Recognition, 2000, Pages: 227-232.
    [41] Matthews I, Cootes T F, Bangham J A, et.al, Extraction of visual features for lipreading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, Vol.24 (2): 198-213.
    [42] Chan M T, HMM-based audio-visual speech recognition integrating geometricand appearance-based visual features. Proceedings of IEEE Fourth Workshop on Multimedia Signal Processing, 2001, Pages: 9-14.
    [43] Finn K, Montgomery A, Automatic optically-based recognition of speech. Pattern Recognition, 1988, Vol.8 (3): 159-164.
    [44] Mase K, Pentland A, Automatic lipreading by optical flow analysis. Systems and Computers in Japan, 1991, Vol.22 (6): 67-76.
    [45] Yuhas B P, Goldstein M H, Sejnowski T J, Integration of acoustic and visual speech signals using neural nets. IEEE Communication Magazine, 1989, Vol.27 (11): 65-71.
    [46]姚鸿勋,高文,王瑞等,视觉语言-唇读综述,电子学报,2001, Vol.29 (2): 239-246.
    [47] Stork D G, Wolff G, Levine E, Neural network lipreading system for improved speech recognition. Proceedings of International Joint Conference on Neural Networks, 1992, Vol.2, Pages: 289-295.
    [48] Stork D G, Hennecke M E, Speechreading by Humans and Machines. NewYork: Springer-Verlag, 1996.
    [49] Mackary D J C, Equivalence of linear Boltzmann chains and Hidden Markov Models. Neural Computation, 1996, Vol.8 (1): 178-181.
    [50] Goldschen A J, Garcia O N, Petajan E D, Continuous optical automatic speech recognition by lip-reading, Proceedings of 28th Annual Asilomar Conference on Signal Systems, and Computer, 1994, Vol.1, Pages: 572-577.
    [51] Rogozan A, Deleglise P, Visible speech modelling and hybrid Hidden Markov models/neural networks based learning for lipreading. Proceedings of IEEE International Joint Symposia on Intelligence and Systems, 1998, Pages: 336-342.
    [52] Gordan M, Kotropoulos C, Pitas I, A support vector machine-based dynamic network for visual speech recognition applications. EURASIP Journal on Applied Signal Processing, 2002, Vol.2002 (11): 1248-1259.
    [53] Gordan M, Kotropoulos C, Pitas I, Application of support vector machines classifiers to visual speech recognition. Proceedings of International Conference on Image Processing, 2002, Vol.3, Pages: 129-132.
    [54]高文,多功能感知机的框架结构.第二届中国计算机智能接口与智能应用学术会议论文集,1995, Pages: 7-20.
    [55]姚鸿勋,唇读若干关键技术的研究与实践.工学博士论文,2003,哈尔滨工业大学.
    [56]王瑞,高文,马继涌,一种快速、鲁棒的唇动检测与定位方法.计算机学报,2001, Vol.24 (8): 866-871.
    [57]姚鸿勋,吕亚娟,高文,基于色度分析的唇动特征提取与识别.电子学报,2002, Vol.30 (2): 168-172.
    [58]王瑞、高文,非监督、多级嘴唇区域分割方法.计算机工程与应用,2003,Vol.39 (2): 53-56.
    [59]洪晓鹏,姚鸿勋,徐铭辉,基于句子级的唇读语料库及其分割算法.计算机工程与应用, 2005, Vol.41 (3): 174-177,190.
    [60]徐铭辉,姚鸿勋,基于句子级的唇读识别技术.计算机工程与应用, 2005, Vol.41 (8): 86-88.
    [61]柴秀娟,姚鸿勋,高文等,唇读识别中的基本口型分类.计算机科学,2002,Vol.29 (2): 130-133.
    [62]单卫,姚鸿勋,高文,唇读中序列口形的分类.中文信息学报,2002, Vol.16 (2): 31-36.
    [63]刘庆辉,姚鸿勋,基于唇动的说话人识别技术.计算机工程与应用, 2006, Vol.42 (12): 85-88.
    [64]晏洁,文本驱动的唇动合成系统.计算机工程与设计,1998, Vol.19 (1): 31-34.
    [65]高文,陈熙霖,马继涌等,基于多模式接口技术的聋人与正常人交流系统.计算机学报,2000, Vol.23 (12): 1253-1260.
    [66]晏洁,宋益波,高文,一个聋哑人辅助教学系统.计算机辅助设计与图形学学报,1998, Vol.10 (5): 400-408.
    [67]徐彦君,杜利民,李国强等,汉语听觉视觉双模态数据库CAVSR1.0.声学学报,2000, Vol.25 (1): 42-49.
    [68]徐彦君,中文双语料语音识别关键技术研究.工学博士论文,1998,中科院语音所.
    [69]钟晓,周昌乐,俞瑞钊,一种面向汉语语音识别的口型形状识别方法.软件学报, 1999, Vol.10 (2): 205-208.
    [70]王志明,蔡莲红,汉语语音视位的研究.应用声学, 2002, Vol.21 (3): 29-34.
    [71]王志明,蔡莲红,吴志勇等,汉语文本-可视语音转换的研究.小型微型计算机系统,2002, Vol.23 (4): 474-477.
    [72]王志明,蔡莲红,动态视位模型及其参数估计.软件学报,2003, Vol.14 (3): 461-466.
    [73]王志明,蔡莲红,艾海舟,基于支持向量回归的唇动参数预测.计算机研究与发展,2003, Vol.40 (11): 1561-1565.
    [74]王志明,陶建华,文本-视觉语音合成综述.计算机研究与发展,2006, Vol.43 (1): 145-152.
    [75] Liang L H, Luo Y, Huang F Y, et.al, A multi-stream audio-video large-vocabulary Mandarin Chinese speech database. Proceedings of IEEE InternationalConference on Multimedia and Expo, 2004, Vol.3, Pages: 1787-1790.
    [76]马淑燕,孔德慧,尹宝才等,基于肤色模型和AAM算法的唇部特征跟踪.全国第16届计算机科学与技术应用学术会议论文集,2004, Pages: 767-770.
    [77]周晔,刘万春,朱玉文,用于说话人的实时唇线提取方法.计算机工程,2006, Vol.32 (5): 202-204.
    [78] McGurk H, MacDonald J, Hearing lips and seeing voice. Nature, 1976, Vol.264 (5588): 746-748.
    [79] Green K P, Kuhl P K, Meltzoff A N, et.al, Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect, Perception & Psychophysics, 1991, Vol.50, Pages: 986-1004.
    [80] Jordan T R, Thomas S M, Effects of horizontal viewing angle on visual and audiovisual speech recognition. Journal of Experimental Psychology: Human Perception and Performance, 2001, Vol.27 (6): 1386-1403.
    [81] Yoshinaga T, Tamura S, Iwano K, et.al, Audio visual speech recognition using lip movement extracted from side-face images. Proceedings of International Conference on Audio-Visual Speech Processing, 2003, Pages: 117-120.
    [82] Iwano K, Yoshinaga T, Tamura S, et.al, Audio-visual speech recognition using lip information extracted from side-face images. EURASIP Journal on Audio, Speech, and Music Processing, 2007, Vol.2007 (1): 4-12.
    [83] Kumatani K, Stiefelhagen R, State Synchronous Modeling on Phone Boundary for Audio Visual Speech Recognition and Application to Muti-View Face Images. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, Vol.4, Pages: 417-420.
    [84] Kumar K, Tsuhan Chen, Stern R M, Profile View Lip Reading. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, Vol.4, Pages: 429-432.
    [85] Lucey P, Potamianos G, Lipreading Using Profile versus Frontal Views. Proceedings of IEEE 8th Workshop on Multimedia Signal Processing, 2006, Pages: 24-28.
    [86] Uda K, Tagawa N, Minagawa A, et.al, Effectiveness evaluation of word characteristics obtained from 3D image information for lipreading. Proceedings of 11th International Conference on Image Analysis and Processing, 2001, Pages: 296-301.
    [87] Hill H, Schyns P G, Akamatus S, Information and Viewpoint dependence in face recognition. Cognition, 1997, Vol.62, Pages: 210-222.
    [88]李刚,王蒙军,林凌,采用非对称唇形轮廓模型提高汉语唇形识别效果.光学精密工程, 2006, Vol.14(3): 473-477.
    [89] Mason J S, Brand J D, The role of dynamics in visual speech biometrics. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Vol.4, Pages: 4076-4079.
    [90] Kaynak M N, Qi Zh, Cheok A D, et.al, Analysis of lip geometric features for audio-visual speech recognition. IEEE Transactions on Systems, Man and Cybernetics, Part A, 2004, Vol.34 (4): 564-570.
    [91] Wang S L, Lau W H, Leung S H, et.al, A real-time automatic lipreading system. Proceedings of the International Symposium on Circuits and Systems, 2004, Vol.2, Pages: 101-104.
    [92] Adjoudani A, Benoit C, On the integration of auditory visual parameters in an HMM-based ASR. Speechreading by Humans and machines, D.G. Stork and M.E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, Pages: 461-471.
    [93] Rogozan A, Deleglise P, Alissali M., Adaptive determination of audio and visual weights for automatic speech recognition. Proceedings of Europe Workshop Audio-Visual Speech Processing, 1997, Pages: 61-64.
    [94] Bregler C, Kong Y,“Eigenlips”for robust speech recognition. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, 1994, Pages: 669-672.
    [95] Krone G, Talle B, Wichert A, et.al, Neural architectures for sensorfusion in speech recognition.Proceedings of Europe Workshop Audio-Visual Speech Processing, 1997, Pages: 57-60.
    [96] Nakamura S, Ito H, Shikano K, Stream weight optimization of speech and lip image sequence for aufio-visual speech recognition, Proceedings of International Conference on Spoken Language Processing, 2000, Vol.3, Pages: 20-23.
    [97] Lee J S, Park Ch H, Training Hidden Markov Models by Hybrid Simulated Annealing for Visual Speech Recognition. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, 2006, Vol.1, Pages: 198-202.
    [98] Duchnowski P, Hunke M, Busching D, et.al, Towards movement-invariant automatic lip-reading and speech recognition. Proceedings of International Conference on Acoustics, Speech, and Signal Processing, 1995, Vol.1, Pages: 109-112.
    [99] Potamianos G, Graf H P, Discriminative training of HMM stream exponents foraudio-visual speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1998, Vol.6, Pages: 3733-3736.
    [100]Dupont S, Luettin J, Audio-visual speech modeling for continuous speech recognition, IEEE Transactions on Multimedia, 2000, Vol.2(3): 141-151.
    [101] Matthews I, Potamianos G, Neti C, et.al, A comparison of model and transform-based visual features for audio-visual LVCSR. Proceedings of IEEE International Conference on Multimedia and Expo, 2001, Pages: 825-828.
    [102]黄伯荣,廖序东,现代汉语. 2002,北京:高等教育出版社.
    [103]金晓达,曹文,刘广徽,汉语普通话语音教学示意图. 2005,北京:北京语言文化大学出版社.
    [104]林焘,王理嘉,语音学教程. 2003,北京:北京大学出版社.
    [105]刘红梅,武传涛,实用汉语语音. 2003,合肥:安徽教育出版社.
    [106]李刚,王蒙军,林凌,面向残疾人的汉语可视语音数据库,中国生物医学工程学报,2007, Vol.26 (3): 356-360.
    [107]Brooke N M, Talking heads and speech recognizers that can see: The computer processing of visual speech signals. Speechreading by Humans and machines, D.G. Stork and M.E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, Pages: 351-371.
    [108]Basu S, Neti C, Rajput N, et.al, Audio-visual large vocabulary continuous speech recognition in the broadcast domain. Proceedings of IEEE Workshop on Multimedia Signal Processing, 1999, Pages: 475-481.
    [109]Potamianos G, Neti C, Improved ROI and within frame discriminant features for Lipreading. Proceedings of International Conference on Image Processing, 2001, Vol.3, Pages: 250-253.
    [110]Potamianos G, Neti C, Faruquie T, et.al, Robust detection of visual ROI for automatic speechreading. Proceedings of IEEE Fourth Workshop on Multimedia Signal Processing, 2001, Pages: 79-84.
    [111]Potamianos G, Neti C, Iyengar G, A cascade visual front end for speaker independent automatic speechreading. International Joutnal of Speech Technology, Special Issue on Multimedia, 2001, Vol.4, Pages: 193-208.
    [112]Lewis T W, Powers D M, Lip Feature Extraction using Red Exclusion. Proceedings of Selected Papers from Pan-Sydney Workshop on Visual Information Processing, 2000, Pages: 61-67.
    [113]Pera V, Sa F, Afonso P, et.al, Audio-visual speech recognition in a Portuguese language based application. Proceedings of IEEE International Conference on Industrial Technology, 2003, Vol.2, Pages: 688-692.
    [114]Lievin M, Luthon F, A hierarchical segmentation algorithm for face analysis: Application to lipreading. Proceedings of IEEE International Conference on Multimedia and Expo, 2000, Vol.2, Pages: 1085-1088.
    [115]Eveno N, Caplier A, Coulon P Y, A New Color Transformation for Lips Segmentation. Procedings of IEEE 4th Workshop on Multimedia Signal Processing, 2001, Pages: 3-8.
    [116]Dargham J A, Chekima A, Lips Detection in the Normalised RGB Colour Scheme. Proceedings of the 2nd International Conference on Information and Communication Technologies, 2006, Vol.1, Pages: 1546-1551.
    [117]Lim E H, Seng K P, Tse K M, RBF Neural Network Mouth Tracking for Audio-visual Speech Recognition System. IEEE Conference on TENCON, 2004, Vol.1, Pages: 84-87.
    [118]Tsuruta N, Iuchi H, Sagheer A, et.al, Self-organizing feature maps for HMM based lip-reading, Proceedings of the 7th International Conference on Knowledge-Based Intelligent Information & Engineering Sys, 2003, Vol.2, Pages: 162-168.
    [119]Moghaddam M K, Safabakhsh R, TASOM-based lip tracking using the color and geometry of the face. Proceedings of Fourth International Conference on Machine Learning and Applications, 2005, Pages: 63-68.
    [120]Kate S, Karen L, James G, et.al, Production domain modeling of pronunciation for visual speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, Vol.5, Pages: 473-476.
    [121]Sagheer A, Tsuruta N, Taniguchi R I, et.al, Hyper column model vs. fast DCT for feature extraction in visual Arabic speech recognition. Proceedings of the 5th IEEE International Symposium on Signal Processing and Information Technology, 2005, Pages: 761-766.
    [122]Sagheer A, Tsuruta N, Taniguchi R I, et.al, Visual speech features representation for automatic lip-reading. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, Vol.2, Pages: 781-784.
    [123]Lucey P J, Lucey S, Sridharan S, Using a Free-Parts Representation for Visual Speech Recognition. Proceedings of Digital Image Computing: Techniques and Applications, 2005, Pages: 379-384.
    [124]Meng Sh, Zhang Y W, A method of visual speech feature area localization, Proceedings of the International Conference on Neural Networks and Signal Processing, 2003, Vol.2, Pages: 1173-1176.
    [125]Zhang J M, Wang L M, Niu D J, et.al, Research and implementation of a real time approach to lip detection in video sequences. Proceeding of International Conference on Machine Learning and Cybernetics, 2003, Vol.5, Pages: 2795-2799.
    [126]Lie W N, Hsieh H C, Lips detection by morphological image processing. Proceedings of IEEE 4th International Conference on Signal Processing, 1998, Vol.2, Pages: 1084-1087.
    [127]候亚荣,熊璋,唇同步的自动识别与验证研究.计算机工程与设计,2004, Vol.25 (2): 166-169.
    [128]Graf H P, Cosatto E, Potamianos M, Robust recognition of faces and facial features with a multi-modal system. Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, 1997, Vol.3, Pages: 2034-2039.
    [129]Pao Ts L, Liao W Y, A motion feature approach for audio-visual recognition. Proceedings of 48th Midwest Symposium on Circuits and Systems, 2005, Vol.1, Pages: 421-424.
    [130]刘文耀,光电图像处理. 2002,北京:电子工业出版社.
    [131]杨枝灵,王开,Visual C++数字图像获取、处理及实践应用. 2003,北京:人民邮电出版社.
    [132]艾海舟,武勃(译),图像处理、分析与机器视觉(第二版). 2003,北京:人民邮电出版社.
    [133]Da Silveira L G, Facon J, Borges D L, Visual speech recognition: a solution from feature extraction to words classification. Proceedings of 16th Brazilian Symposium on Computer Graphics and Image Processing, 2003, Pages: 399-405.
    [134]Li G, Wang M J, Lin L, Extracting Lip Parameters in Speech Synthesis System Driven by Visual-speech. Proceedings of IEEE International Conference on Innovative Computing, Information and Control, 2006, Vol.2, Pages: 346-349.
    [135]阮秋琦,阮宇智(译),数字图像处理(第二版). 2004,北京:电子工业出版社.
    [136]Sengur A, Turkoglu I, Cevdet Ince M, Image thresholding applications with two-dimensional entropy. Proceedings of the IEEE 13th Signal Processing and Communications Applications Conference, 2005, Pages: 275-278.
    [137]Li Y X, Mao Z Y, Tian L F, et.al, A Kind of Two-Dimensional Entropic Image Segmentation Method Based on Artificial Immune Algorithm. Proceedings of the 6th Congress on Intelligent Control and Automation, 2006, Vol.2, Pages: 10412-10415.
    [138]陈果,左洪福,图像分割的二维最大熵遗传算法.计算机辅助设计与图形学学报,2002, Vol.4 (6): 530-534.
    [139]Kauppien H, Sepanen T, An experiment comparison of autogressive and Fourier-based descriptors in 2D shape classification. IEEE Transaction on PAMI, 1995, Vol.2, Pages: 201-207.
    [140]王涛,刘文印,孙家广等,傅里叶描述子识别物体形状.计算机研究与发展, 2002, Vol.39 (12): 1714-1719.
    [141]Chiou G I, Hwang J N, Lipreading from color motion video. Proceedings ofIEEE International Conference on Acoustics, Speech and Signal Processing, 1996, Vol.4, Pages: 2156-2159.
    [142]Nakamura K, Murakami N, Takagi K, et.al, A real-time lipreading LSI for word recognition. Proceedings of IEEE Asia-Pacific Conference on ASIC, 2002, Pages: 303-306.
    [143]李培华,张田文,主动轮廓线模型(蛇模型)综述.软件学报,2000, Vol.11(6): 751-757.
    [144]杨杨,张田文,一种新的主动轮廓线跟踪算法.计算机学报,1998, Vol.21(Suppl): 297-302.
    [145]Chan M T, Zhang Y, Huang T S, Real-time lip tracking and bimodal continuous speech recognition. Proceedings of IEEE Second Workshop on Multimedia Signal Processing, 1998, Pages: 65-70.
    [146]R.Sanchez M U, Matas J, Kittler J, Statistical Chromaticity models for lip tracting with B-splines. Proceedings of International Conference on Aucio- and Video-based Biometric Person authentication, 1997, Pages: 69-76.
    [147]Persoon E, Fu K S, Shape Discrimination Using Fourier Descriptors. IEEE Transactions on Systems, Man and Cybernetics, 1977, Vol.7 (3): 170-179.
    [148]Tello R, Fourier descriptors for computer graphics. IEEE Transactions on Systems, Man and Cybernetics, 1995, Vol.25 (5): 861-865.
    [149]Wu M F, Sheu H T, Contour-based correspondence using Fourier descriptors. IEE Proceedings of Vision, Image and Signal Processing, 1997, Vol.144 (3): 150-160.
    [150]Arbter K, Snyder W E, Burkhardt H, et.al, Application of affine-invariant Forier descriptors to recognition of 3-Dobjects. IEEE transaction on Pattern Analysis and Machine Intelligence, 1990, Vol.12 (7): 640-647.
    [151]王涛,刘文印,孙家广等,傅立叶描述子识别物体形状.计算机研究与发展,2002, Vol.39 (4): 1716-1719.
    [152]Chaker F, Bannour M T, Ghorbel F, A complete and stable set of affine-invariant Fourier descriptors. Proceedings of 12th International Conference on Image Analysis and Processing, 2003, Pages: 578-581.
    [153]楚恒,朱维乐,基于DCT变换的图像融合方法研究.光学精密工程,2006, Vol.14 (2): 266-273.
    [154]冯春环,涂建平,郭建,基于离散余弦变换的红外目标识别算法.系统仿真学报,2005, Vol.17 (6): 1363-1365, 1369.
    [155]赵松年,熊小芸,子波变换与子波分析. 1996,北京:电子工业出版社.
    [156]杨翔英,章毓晋,小波轮廓描述符及在图像查询中的应用.计算机学报,1999, Vol.22 (7): 752-757.
    [157]Chuang G C-H, Kuo C-C J, Wavelet descriptor of planar curves: theory andapplications. IEEE Transactions on Image Processing, 1996, Vol.5 (1): 56-72.
    [158]Hung K CH, The generalized uniqueness wavelet descriptor for planar closed curves. IEEE Transactions on Image Processing, 2000, Vol.9 (5): 834-845.
    [159]Marco S D, Heller P N, Weiss J, An M-band two dimensional translation invariant wavelet transform and its applications. Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing, 1995, Pages: 1077-1080.
    [160]Hui Y, Kok C W, Nguyen T Q, Theory and design of shift invariant filter banks. Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, 1996, Pages: 49-52.
    [161]Wuncsh P, Laine A, Wavelet descriptors for multiresolution recognition of handwritten characters. Pattern Recognition, 1995, Vol. 28 (8): 1237-1249.
    [162]朱同林,彭嘉雄,轮廓形状匹配的形状函数小波特征方法.自动化学报,2001, Vol.27 (6): 855-859.
    [163]Potamianos G, Graf H P, Cosatto E, An image transform approach for HMM based automatic lipreading. Proceedings of International Conference on Image Processing, 1998, Vol.3, Pages: 173-177.
    [164]Gurbuz S, Tufekci Z, Patterson E, et.al, Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001, Vol.1, Pages: 177-180.
    [165]李刚,王蒙军,林凌.视觉驱动的语音合成系统中唇形轮廓的傅里叶描述.仪器仪表学报,2007, Vol.28 (8): 1464-1468.
    [166]李刚,王蒙军,林凌等.视觉驱动的语音合成系统中唇形轮廓的正交变换描述.光学精密工程, 2007, Vol.15 (7): 1117-1123.
    [167]Ruach H E, Probability concepts for an expert system used for data fusion. AI Magazine, 1984, Pages: 55-60.
    [168]Waltz J, Llinas J, Multisensor data fusion. 1990, Massachusetts, USA: Artech House Inc.
    [169]Luo R E, Michael G K, Multisensor integration and fusion in intelligent systems. IEEE Transaction on System, Man and Cybernetics, 1989, Vol.19 (5): 901-931.
    [170]刘同明,夏祖勋,解洪成,数据合成技术及其应用. 1998,北京:国防工业出版社.
    [171]滕召胜,罗隆福,童调生,智能检测系统与数据融合. 2000,北京:机械工业出版社.
    [172]杨万海,多传感器数据融合及其应用. 2004,西安:西安电子科技大学出版社.
    [173]Hall D L, Llinas J, An introduction to multi-sensor data fusion. Proceedings ofIEEE, 1997, Vol.85 (1): 6-23.
    [174]何友,王国宏,陆大琻等,多传感器信息融合及其应用. 2000,北京:电子工业出版社.
    [175]杨华,林卉,数据融合的研究综述.矿山测量,2005, Vol.2005 (3): 24-28.
    [176]夏明革,何友,唐小明等,多传感器图像融合综述.电光与控制,2002,Vol.9 (4): 1-7.
    [177]Wang Zh J, Ziou D, Armenakis C, et.al, A comparative analysis of image fusion methods. IEEE Transactions on Geoscience and Remote Sensing, 2005, Vol.43 (6): 1391-1402.
    [178]Liu Ch J, Wechsler H, A shape- and texture-based enhanced Fisher classifier for face recognition. IEEE Transactions on Image Processing, 2001, Vol.10 (4): 598-608.
    [179]Yang J, Yang J Y, Generalized K-L transform based combined feature extraction. Pattern Recognition, 2002, Vol.35 (1): 295-297.
    [180]Sanderson C, Paliwal K K, Identity verification using speech and face information. Digital Signal Processing, 2004, Vol.14 (5): 449-480.
    [181]Yang J, Yang J Y, Zhang D, et.al, Feature Fusion: Parallel strategy vs. Serial strategy. Pattern Recognition, 2003, Vol.36 (6): 1369-1381.
    [182]徐赵辉,杨扬,颉斌,基于弹性网格和Legendre矩的手写体汉字识别方法.计算机工程与应用,2006,Vol.42 (17): 163-164, 224.
    [183]杨健,杨静宇,王正群等,一种组合特征抽取的新方法.计算机学报,2002, Vol.25 (6): 570-575.
    [184]杨健,杨静宇,高建贞,基于并行特征组合与广义K-L变换的字符识别.软件学报,2003, Vol.14 (3): 490-495.
    [185]孙权森,曾生根,王平安等,典型相关分析理论及其在特征融合中的应用.计算机学报,2005, Vol.28 (9): 1524-1532.
    [186]吴锐,赵巍,尹芳等,特征融合及相似度判据在英文识别中的应用.计算机工程与应用,2005, Vol.41 (16): 55-57.
    [187]王蒙军,李刚,林凌等,基于数据融合的唇动特征识别. 2007浙江大学NBIC博士生论坛。
    [188]王蒙军,李刚,林凌等,唇动图像序列的加权组合特征分析.光学精密工程, 2008, Vol.16.
    [189]Rabiner L R, A tutorial on hidden Markov models and selected application in speech recongnition. Proceedings of IEEE, 1989, Vol.77 (2): 257-286.
    [190]Schuster M, Rigoll G, Fast on-line video image sequence recognition with statistical methods. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1996, Pages: 450-454.
    [191]Samaria F, Fallside F, Face identification and feature extraction using hidden Markov models. Image processing Theory and Applications, Amsterdam: Elsevier Science Publishers, 1993, Pages: 295-302.
    [192]Agazzi K, Kuo O E, Levin S S, et.al, Connected and degraded text recognition using planar hidden Markov models, Proceedings of the Intentional Conference on Acoustics, Speech, and Signal Processing, 1993, Vol.5, Pages: 113-116.
    [193]何强,何英,Matlab扩展编程. 2002,北京:清华大学出版社.
    [194]杜世平,对经典隐马尔可夫模型学习算法的改进.高等数学研究,2006, Vol.9 (4): 58-60.
    [195]杜世平,李海,二阶隐马尔可夫模型及其在计算语言学重的应用.四川大学学报(自然科学版),2004, Vol.41 (2): 284-289.
    [196]史笑兴,王太君,何振亚,二阶隐马尔可夫模型的学习算法及其与一阶隐马尔可夫模型的关系.应用科学学报,2001, Vol.19 (3): 29-32.
    [197]Mari J F, Fohr D, Junqua J C, A second-order HMM for high performance word and phoneme-based continuous speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, Vol.1, Pages: 435-438.
    [198]Othman H, Aboulnasr T, A simplified second-order HMM with application to face recognition. Proceedings of IEEE International Symposium on Circuits and Systems, 2001, Vol.2, Pages: 161-164.
    [199]Kundu A, He Y, Bahl P, Recognition of handwritten word: first and second order hidden Markov model based approach. Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition, 1988, Pages: 457-462.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700