Lipreading Procedure Based on Dynamic Programming
详细信息    查看全文
  • 作者:Agnieszka Owczarek (1) agnieszka.owczarek@p.lodz.pl
    Krzysztof ?lot (1) k.slot@p.lodz.pl
  • 关键词:lip contour extraction – dynamic programming – lipreading
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2012
  • 出版时间:2012
  • 年:2012
  • 卷:7267
  • 期:1
  • 页码:559-566
  • 全文大小:549.0 KB
  • 参考文献:1. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
    2. Faraj, M.I., Bigun, J.: Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition. IEEE Transactions on Computers 56(9), 1169–1175 (2007)
    3. Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-Based Bimodal Recognition. IEEE Transaction on Multimedia 4(1), 23–36 (2002)
    4. Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, J.N.: Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition. In: 2001 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 177–180 (2001)
    5. Potamianos, G., Neti, C.: Improved ROI and within frame discriminant features for lipreading. In: International Conference on Image Processing, vol. 3, pp. 250–253 (2002)
    6. Hennecke, M.E., Stork, D.G., Prasad, K.V.: Visionary speech: Looking ahead to practical speechreading systems. In: Speechreading by Humans and Machines, pp. 331–349 (1996)
    7. Adjoudani, A. Benoit, C.: On the integration of auditory and visual,parameters in an HMM-based ASR. In: Speechreading by Humans and Machines, pp. 461–471 (1996)
    8. Rogozan, A., Deltglise, P., Alissali, M.: Adaptive determination of audio and visual weights for automatic speech recognition. In: Proc. Europ. Tut. Res. Work. Audio-Visual Speech Process, pp. 61–64 (1997)
    9. Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)
    10. Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)
    11. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Internaltional Journal of Computer Vision, 321–331 (1987)
    12. Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. In: Proc. Int. Conf. Multimedia Expo. (2001)
    13. Duchnowski, P., Hunke, M., Biisching, D., Meier, U., Waibel, A.: Toward movement-invariant automatic lip-reading and speech recognition. In: Proc. Int. Conf. Acoust. Speech Signal Process., vol. 1, pp. 109–112 (1995)
    14. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proc. IEEE 91(9), 1306–1326 (2003)
    15. Bregler, C., Konig, Y.: Eigenlips for robust speech recognition. In: Proc. IEEE Conf. Acoustics, Speech and Signal Processing, pp. 669–672 (1994)
    16. Chiou, G.I., Hwang, J.-N.: Lipreading from color video. Trans. Image Processing 6, 1192–1195 (1997)
    17. Viola, P., Jones, M.J.: Robust Real-Time Face Detection. Information Journal of Computer Vision 57(2), 137–154 (2004)
    18. Nowak, H.: Lip-reading with discriminative deformable models. Machine Graphic and Vision International Journal 15, 567–575 (2006)
    19. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man., Cyber. 9(1), 62–66 (1979)
    20. Bellman, R.E., Dreyfus, S.E.: Applied dynamic programming. Princeton University Press (1971)
    21. Lee, E.T.Y.: Comments on some B-spline algorithms. Computing 36(3), 229–238
    22. Slot, K.: Biometric Recognition, pp. 101–103. WKL Press, Warszawa (2010)
    23. Schapire, R.E.: The boosting approach to machine learning: An overview: Nonlinear Estimation and Classification. Springer, Heidelberg (2003)
    24. Matthews, I., Bangham, J.A., Cox, S.: Audio-visual speech recognition using multiscale nonlinear image decomposition. In: Proc. Znt. Gonf. Speech Lang. Process., Philadelphia, pp. 38–41 (1996)
  • 作者单位:1. Institute of Electronics, Technical University of Lodz, Wolczanska Street 211/215, 90-924 Lodz, Poland
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
The following paper describes a novel lipreading procedure based on dynamic programming. We proposed a new method of outer lip contour extraction and representation. Lip shapes, corresponding to selected group of visems, are firstly extracted using dynamic programming and then approximated by B-splines. Coordinates of B-spline control points form final feature vector used for visem recognition task. The discontinuity of lip gradient image is addressed by dynamic programming technique. This has the advantage of global minimum detection and consequently optimal lip contour extraction. Experiments for Polish language utterances show that seven classes of visems can be recognized with 75% accuracy.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700