基于单目视觉的实时手势识别系统

英文题名：Monocular Vision-based Real-time Hand Gesture Recognition System
作者：吴堃
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：单目视觉 ; 手势识别 ; Hu不变矩 ; 傅里叶描述子 ; 多层感知器
英文关键词：monocular vision ; gesture recognition ; Hu moment invariant ; Fourier descriptor ; MLP
学位年度：2009
导师：王轩
学科代码：081202
学位授予单位：哈尔滨工业大学
论文提交日期：2009-12-01

摘要

随着计算机技术的不断发展,手势识别已经成为人机交互领域中的一项关键技术。现今,作为一种新型的人机交互技术,手势识别已经成为涉及图像处理、模式识别、计算机视觉等领域的一个比较活跃的课题。然而由于手势本身具有的多样性、多义性、以及时间和空间上的差异性等特点,加之人手是复杂变形体,因此手势识别是一个极富挑战性的多学科交叉研究课题。本文结合国家863课题“基于手势的拟人化人机交互系统”,从手势图像预处理、手势特征提取和手势识别等三个方面研究了基于单目视觉的实时手势识别的相关算法。
     本文设计并实现了一个基于单目视觉的实时手势识别系统,该系统能够实时地对从摄像头输入的14类常用静态手势进行识别,并通过识别结果对输入法进行控制。系统主要分为三个部分:(1)手势图像预处理:实验表明,人类肤色的色调值在一个较窄的数值范围内变化,具有明显的肤色聚类性,据此本文采用HSV颜色空间进行手势区域分割。在分割手势区域后,对图像进行相应增强操作并使用拉普拉斯边缘提取算法获取手势轮廓;(2)手势图像特征提取:经过对相关特征进行分析,本文最后选用的手势特征是由手势区域特征,Hu不变矩特征以及傅里叶描述子等特征联合组成,结果表明该联合特征能很好的表征手势信息;(3)手势识别部分:多层感知器有着模式识别能力强优点,本文使用多层感知器进行手势分类,同时还使用贝叶斯方法进行实验对比分析。
     实验结果表明,本文提出的基于手势区域特征,Hu不变矩特征,以及傅里叶描述子组成的联合特征与多层感知器相结合的手势识别方法有着较高的识别率(97.4%),符合高识别率以及实时处理的设计准则。
With the development of the computer techniques, gesture recognition is becoming one of the key techniques of human-computer interaction technology. It is a hot research topic in the fields of image processing, pattern recognition, computer vision etc. However, hand gestures recognition is an extremely challenging inter-disciplinary project, due to two reasons: firstly, hand gestures are rich in diversities, multi-meanings, and space-time varieties; secondly,human hands are complex non-rigid objects. This paper presents a vision-based hand gestures recognition algorithm from points of pre-processing, feature extraction and recognition of hand gestures image based on the national 863 programs“anthropomorphic human-machine interaction system based on gesture”.
     This paper studies the gesture recognition related fields, designs and implements a static monocular vision-based gesture recognition system. This system can capture and recognize 14 common static gesture in real-time and can control input method edit. This system has high recognition accuracy and real-time characteristics. The system is mainly divided into three parts. First, preprocession of the original hand gesture image: the experimental results show that the value of human skin color varies in a narrow range and it has obvious property of skin color clustering. Accordingly, this paper uses hsv color space for gesture region segmentation. After segmenting gesture region, the system gets the edge through noise smoothing and Laplacian edge extraction; Second, extraxction of hand gesture feature, in this part the system extracts Hu moment invariant, hand gesture area feature and Fourier descriptor; Third, the real-time procession to the video data stream, in this part the system compares the effect of Bayes, MLP machine learning method, and at last the system selects MLP as classifier.
     The experimental results show that the gesture recognition method based on gesture area feature, Hu moment invariant feature, Fourier descriptor feature and MLP classifier in this paper has a higher recognition accuracy(97.4%) which is consistent with the design criteria of high recognition rate and real-time processing.

引文

1胡友树.手势识别技术综述.中国科技信息. 2005, 1(2): 41~42
    2 T. Takahashi, F. K. Shino. Hand gesture coding based on experiments using a hand gesture interface device. SIGCHl Bulletin. 1991, 23(2):67~73
    3 Davis, M. Shah. Visual gesture recognition. In IEEE Proceeding on Vision-Image Signal Processing. 1994, 141(2): 321~332
    4 Starner, T., Pentland. A Real-time American Sign Language Recognition from Video Using Hidden Markov Models. Technical Report TR375, Media Lab, MIT, 1996
    5 Kirsti Grobel, Marcell Assam. Isolated sign language recognition using hidden Markov models. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 1997. Orlando, FL, 1997:162~167
    6 C. Vogler, D. Metaxas. Adapting Hidden Markov Models for ASL recognition by using three -dimensional computer vision methods. SMC' 97:156~161
    7 C. Lee, Y. Xu. Online interactive learning of gestures for human/robot interfaces. In Proceeding of IEEE Int.Conf. on Robotics and Automation. 1996, 3(1):30~42
    8 Mohammed Waleed Kadous. Machine recognition of Auslan signs using PowerGloves:Towards large-lexicon recognition of sign language. In Lynn Messing, editor, Proceedings of the Workshop on the Integration of Gesture in Language and Speech, Applied Science and Engineering Laboratories Newark,Delaware and Wilmington, Delaware, October 1996: 165~174
    9任海兵,祝远新,徐光祐等.基于视觉手势识别的研究综述.电子学报. 2000, 28(2): 118~121
    10 Wen Gao. Enhanced user interface Proceedings of IVYCS' 95 workshop using hand gesture recognition software computing, Bei jing 1995
    11吴江琴,高文,陈熙霖.基于数据手套的汉语手指字母的识别.模式识别与人工智能. 1999, 12 (1):74~78
    12 Jiyong Ma, Wen Gao, Jiangqin Wu et al. A Continuous Chinese Sign Language Recognition System. IEEE International Conference on Face and Gesture, March, FG' 2000: 428~433, 28~31
    13 Wen Gao, Jiyong Ma, Jiangqin Wu et al. Large Vocabulary Sign LanguageRecognition Based on HMM/ANN/DP. International Journal of Pattern Recognition and Artificial Intelligence, 2000, 14(5):587~602
    14 Wen Gao, Jiyong Ma, Xilin Chen et al. HandTalker: A Multimodal Dialog System Using Sign Language and 3-D Virtual Human. The Third International Conference on Multimodal Interface. Lecture Notes in Computer Science, Beijing Oct. 2000:564~571
    15祝远新,徐光祐,黄浴.基于表观的动态孤立手势识别.软件学报. 2000,11(1): 54~61
    16任海兵,祝远新,徐光祐等.连续动态手势的时空表观建模及识别.计算机学报. 2000, 23(8): 824~828
    17 Liang R-H., Ouhyoung. A Real-time Continuous Alphabetic Sign Language to Speech Conversion VR System. Computer Graphics Forum, 1995, 14(3); 67~77
    18张良国,吴江琴,高文等.基于Hausdorff距离的手势识别.中国图象图形学报. 2002, 7(7): 1144~1150
    19 G. Bradski, Boon-Lock Yeo, Minerva M. Yeung. Gesture for video content navigation. SPIE 3656 ( Proc. of the IS&T/ SPIE Conf . on Storage and Retrieval for Image and Video Database VII), San Jose, California, 1999:230~242
    20 J. J. Kuch, Vision-based hand modeling and tracking for virtual telecomferencing and telecollaboration. Proc. IEEE Int’l Conf. Computer Vision, Cambridge, Mass, 1995
    21 D. M. Gavrila, L. S. Davis. Towards 3D model-based tracking and recognition of human movement: a multi-view approach. Proc. Int’l Workshop on Automatic Face and Gesture Recognition, Switzerland, 1995:272~277
    22 J. Lee, T. L. Kunii. Model-based analysis of hand posture. IEEE Computer Graphics and Applications, Sept. 1995:77~86
    23 Trevor J. Darrell, Irfan A. Essa, Alex P. Pentland. Task-specific gesture analysis in real-time using interpolated views. IEEE Trans. PAMI, Dec. 1996, 18 (12): 1236~1242
    24 A. Bobick, J. Davis. Real-time recognition of activity using temporal templates. Proc. of Third IEEE Workshop on applications of computer vision, Florida, 1996: 39~42
    25 R. Cipolla, N. J. Hollinghurst. Human-robot interface by pointing withuncalibrated stereo vision. Image and vision computing, Mar. 1996, 14: 171~178
    26 Quek F. Unencumbered gestural interaction. IEEE Multimedia, 1996: 36~47
    27 R. Culter, M. Turk. View-based interpretation of real-time optical flow for gesture recognition. Proc. of 3rd Int′l Conf. Automatic Face and Gesture Recognition, Japan, 1998
    28 G. Xu, Y. Zhu, Y. Huang et al. Automatic visual recognition of isolated hand gestures with computing spatio-temporal representations. Proc. Of the 1998 Symp. on Image, Speech, Signal Processing and Robotics ( IS2 SPR’98), 1998, I: 49~54
    29 T. Starner, J. Weaver et al. Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. PMAI, 1998, 20(12): 1371~1375
    30 David Alan Becker, Sensi. A Real-Time Recognition, Feedback and Training System for T’ai Chi Gestures. MITMedia Lab, May, 1997
    31 Foley, J. D., van Dam, A. Fundamentals of Interactive Computer Grap hics. Reading, MA: Addison-Wesley, 1982
    32 Gonzalez, R. C., Woods, R. E. Digital Image Processing. 3rd ed, Reading, MA: Addison-Wesley, 1992
    33 Levkowitz, H. Color Theory and Modeling for Computer Graphics, Visualization, and Multimedia Applications. Boston: Kluwer Academic Publishers, 1997
    34 Ledley, S, Buas, M., Golab, T.. Fundamentals of true-color image processing. In : Proceedings of the 10th International Conference on Pattern Recognition. 1990:791~795
    35 Bajon, J., Cattoen, M. et al. Real-Time colorimetric transformations used in robot vision. In : Proceedings of the MICAD. 1985:76~86
    36陶霖密,彭振云,徐光祐 .人体的肤色特征.软件学报. 2001, 12(7): 1032~1041
    37 Rafael C. Gonzalez, Richard E.Woods.数字图像处理.阮秋琦,阮宇智.第二版.电子工业出版社, 2003: 59~112
    38郭兴伟,葛元,王林泉.基于形状特征的字母手势的分类及识别算法.计算机工程. 2004, 30(18): 130~132
    39刘肃亮,周明全,韦智勇.基于VFW的视频应用程序开发.西北大学学报.2003, 12(6)
    40张星明.视频图像捕获及运动检测技术的实现.计算机工程. 2002, 28(8): 130~132
    41刘祎玮. Visual C++视频/音频开发实用工程案例精选.人民邮电出版社, 2004: 11~33
    42郎锐.数字图像处理学Visual C++实现.北京希望电子出版社, 2003: 27~40
    43王茂吉,基于视觉的静态手势识别系统,哈尔滨工业大学,硕士毕业论文. 2006:16~33
    44 Linda G. Shapiro, George C. Stockman.计算机视觉.赵清杰,钱芳,蔡利栋.机械工业出版社, 2005: 54~61.
    45郭兴伟.基于视觉的手势识别算法研究.大连海事大学硕士学位论文. 2003:24~33
    46薛万欣. Bayesian网推理及应用.吉林人学出版社. 2006.
    47牛国君.神经网络方法在语言信号检测中应用的研究.西南交通大学硕士生,学位论文. 2003:24~26

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700