Lipreading Procedure Based on Dynamic Programming

详细信息查看全文

作者：Agnieszka Owczarek (1) agnieszka.owczarek@p.lodz.pl
Krzysztof ?lot (1) k.slot@p.lodz.pl
关键词：lip contour extraction – dynamic programming – lipreading
刊名：Lecture Notes in Computer Science
出版年：2012
出版时间：2012
年：2012
卷：7267
期：1
页码：559-566
全文大小：549.0 KB
参考文献：1. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
2. Faraj, M.I., Bigun, J.: Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition. IEEE Transactions on Computers 56(9), 1169–1175 (2007)
3. Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-Based Bimodal Recognition. IEEE Transaction on Multimedia 4(1), 23–36 (2002)
4. Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, J.N.: Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition. In: 2001 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 177–180 (2001)
5. Potamianos, G., Neti, C.: Improved ROI and within frame discriminant features for lipreading. In: International Conference on Image Processing, vol. 3, pp. 250–253 (2002)
6. Hennecke, M.E., Stork, D.G., Prasad, K.V.: Visionary speech: Looking ahead to practical speechreading systems. In: Speechreading by Humans and Machines, pp. 331–349 (1996)
7. Adjoudani, A. Benoit, C.: On the integration of auditory and visual,parameters in an HMM-based ASR. In: Speechreading by Humans and Machines, pp. 461–471 (1996)
8. Rogozan, A., Deltglise, P., Alissali, M.: Adaptive determination of audio and visual weights for automatic speech recognition. In: Proc. Europ. Tut. Res. Work. Audio-Visual Speech Process, pp. 61–64 (1997)
9. Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)
10. Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)
11. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Internaltional Journal of Computer Vision, 321–331 (1987)
12. Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. In: Proc. Int. Conf. Multimedia Expo. (2001)
13. Duchnowski, P., Hunke, M., Biisching, D., Meier, U., Waibel, A.: Toward movement-invariant automatic lip-reading and speech recognition. In: Proc. Int. Conf. Acoust. Speech Signal Process., vol. 1, pp. 109–112 (1995)
14. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proc. IEEE 91(9), 1306–1326 (2003)
15. Bregler, C., Konig, Y.: Eigenlips for robust speech recognition. In: Proc. IEEE Conf. Acoustics, Speech and Signal Processing, pp. 669–672 (1994)
16. Chiou, G.I., Hwang, J.-N.: Lipreading from color video. Trans. Image Processing 6, 1192–1195 (1997)
17. Viola, P., Jones, M.J.: Robust Real-Time Face Detection. Information Journal of Computer Vision 57(2), 137–154 (2004)
18. Nowak, H.: Lip-reading with discriminative deformable models. Machine Graphic and Vision International Journal 15, 567–575 (2006)
19. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man., Cyber. 9(1), 62–66 (1979)
20. Bellman, R.E., Dreyfus, S.E.: Applied dynamic programming. Princeton University Press (1971)
21. Lee, E.T.Y.: Comments on some B-spline algorithms. Computing 36(3), 229–238
22. Slot, K.: Biometric Recognition, pp. 101–103. WKL Press, Warszawa (2010)
23. Schapire, R.E.: The boosting approach to machine learning: An overview: Nonlinear Estimation and Classification. Springer, Heidelberg (2003)
24. Matthews, I., Bangham, J.A., Cox, S.: Audio-visual speech recognition using multiscale nonlinear image decomposition. In: Proc. Znt. Gonf. Speech Lang. Process., Philadelphia, pp. 38–41 (1996)
作者单位：1. Institute of Electronics, Technical University of Lodz, Wolczanska Street 211/215, 90-924 Lodz, Poland
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

The following paper describes a novel lipreading procedure based on dynamic programming. We proposed a new method of outer lip contour extraction and representation. Lip shapes, corresponding to selected group of visems, are firstly extracted using dynamic programming and then approximated by B-splines. Coordinates of B-spline control points form final feature vector used for visem recognition task. The discontinuity of lip gradient image is addressed by dynamic programming technique. This has the advantage of global minimum detection and consequently optimal lip contour extraction. Experiments for Polish language utterances show that seven classes of visems can be recognized with 75% accuracy.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700