Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition

详细信息查看全文

作者：Mariusz Kubanek (1) mariusz.kubanek@icis.pcz.pl
Janusz Bobulski (1) januszb@icis.pcz.pl
Lukasz Adrjanowicz (1) lukasz.adrjanowicz@icis.pcz.pl
关键词：lip reading &#8211 ; visual speech &#8211 ; audio visual speech recognition
刊名：Lecture Notes in Computer Science
出版年：2012
出版时间：2012
年：2012
卷：7267
期：1
页码：535-542
全文大小：352.8 KB
参考文献：1. Shin, J., Lee, J., Kim, D.: Real-time lip reading system for isolated Korean word recognition. Pattern Recognition 44, 559–571 (2011)
2. Neti, C., Potamianos, G., Luttin, J., Mattews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio Visual Speech-Recognition. 2000 Final Report (2000)
3. Zhi, Q., Kaynak, M.N.N., Sengupta, K., Cheok, A.D., Ko, C.C.: A study of the modeling aspects in bimodal speech recognition. In: Proc. 2001 IEEE International Conference on Multimedia and Expo, ICME 2001 (2001)
4. Jian, Z., Kaynak, M.N.N., Cheok, A.D., Chung, K.C.: Real-time Lip-tracking For Virtual Lip Implementation in Virtual Environments and Computer Games. In: Proc. 2001 International Fuzzy Systems Conference (2001)
5. Petajan, E.: Automatic lipreading to enhance speech recognition. In: Proceedings of Global Telecommunications Conference, Atlanta, GA, pp. 265–272 (1984)
6. Bailly, G., Vatikiotis-Basteson, E., Pierrier, P.: Issues in Visual Speech Processing. MIT Press (2004)
7. Park, S., Lee, J., Kim, W.: Face Recognition Using Haar-like feature/LDA. In: Workshop on Image Processing and Image Understanding, IPIU 2004 (January 2004)
8. Hong, K., Min, J.-H., Lee, W., Kim, J.: Real Time Face Detection and Recognition System Using Haar-Like Feature/HMM in Ubiquitous Network Environments. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Lagan谩, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3480, pp. 1154–1161. Springer, Heidelberg (2005)
9. Kukharev, G., Kuzminski, A.: Biometric Technology, Part. 1: Methods for Face Recognition. Szczecin University of Technology, Faculty of Computer Science (2003) (in Polish)
10. Choraś, M.: Human Lips as Emerging Biometrics Modality. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2008. LNCS, vol. 5112, pp. 993–1002. Springer, Heidelberg (2008)
11. Kaynak, M.N.N., Zhi, Q., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio - Visual Modeling for Bimodal Speech Recognition. In: Proc. 2001 International Fuzzy Systems Conference (2001)
12. Liu, X., Zhao, Y., Pi, X., Liang, L., Nefian, A.V.: Audio-visual continuous speechr ecognition using a coupled hidden Markov model. In: ICSLP 2002, pp. 213–216 (2002)
13. Hasegawa-Johnson, M., Livescu, K., Lal, P., Saenko, K.: Audiovisual speech recognition with articulator positions as hidden variables. In: Proc. International Congress of Phonetic Sciences (ICPhS) (2007)
14. Shao, X., Barker, J.: Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment. Speech Communication 50, 337–353 (2008)
15. Nefian, A.V., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., Murphy, K.: A coupled HMM for audio-visual speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
16. Nefian, A.V., Liang, L., Pi, X., Liu, X., Mao, C.: An coupled hidden Markov model for audio-visual speech recognition. In: International Conference on Acoustics, Speech and Signal Processing (2002)
作者单位：1. Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego Street 73, 42-200 Czestochowa, Poland
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

This paper proposes a method of tracking the lips in the system of audio-visual speech recognition. Presented methods consists of a face detector, face tracker, lip detector, lip tracker, and word classifier. In speech recognition systems, the audio signal is exposed to a large amount of acoustic noise, therefor scientists are looking for ways to reduce audio interference on recognition results. Visual speech is one of the sources that is not perturbed by the acoustic environment and noise. To analyze the video speech one has to develop a method of lip tracking. This work presents a method for automatic detection of the outer edges of the lips, which was used to identify individual words in audio-visual speech recognition. Additionally the paper also shows how to use video speech to divide the audio signal into phonemes.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700