使用单摄像机的人机交互系统

英文题名：A Human-computer Interaction System Using Single Camera
作者：袁昕
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：人机交互 ; 人体动作识别 ; 计算机视觉 ; 单摄像机 ; 两层背景去除 ; 星形骨骼模型 ; 支持向量机
英文关键词：Human-Computer Interaction ; Human Action Recognition ; Computer Vision ; Single Camera ; Two-Layered Background Subtraction ; Star Skeleton ; Support Vector Machine
学位年度：2010
导师：杨旭波
学科代码：081203
学位授予单位：上海交通大学
论文提交日期：2010-01-01

摘要

人机交互(Human-computer interaction,简称HCI)是研究人与计算机之间通过相互理解的交流与通信,在最大程度上为人们完成信息管理、服务和处理等功能的门技术学科。当前,人机交互技术已经从以计算机为中心逐步转移到以用户为中心。人体动作是一种自然、直观、易于学习的人机交互手段,以人体直接作为计算机的输入设备,人机间通讯将不再需要多余的媒体,用户可以简单地定义一种适当的动作来对周围的机器进行控制。因此,采用人体动作输入作为交互手段在人机交互技术中占有重要的地位。随着计算机视觉和图像处理技术的发展,基于视觉的动作识别技术日益受到重视,并被引入到人机交互系统之中。
     本文从计算机视觉的角度出发,研究了借助单个摄像机,通过对人体动作的识别,达到人机交互目的的相关技术。讨论并分析了包括计算机视觉、图像处理技术以及模式识别等领域的有关技术和方法。同时,结合实际应用的需求,我们对于一些方法进行了改进和整合。
     首先,本文系统地阐述了动作识别系统的主要流程和涉及的技术领域。对于目前动作识别应用中,已有的一些常用方法进行了分析和讨论。我们按照通常系统实现的框架和步骤作为介绍的顺序,条理清晰,对相关的方法做了综述和总结。刚刚进入此领域的研究者可以通过本文对于动作识别有一个全面的了解。
     之后,对于涉及的相关技术,本文分别做了详尽的阐述：
     背景去除是基于视觉的动作识别系统实现的必要过程。我们对于基本的背景去除方法进行了总结,并详细介绍了背景差分和混合高斯模型的去除方法。进而本文又针对于有效去除阴影的问题提出了两层背景去除法。
     人体信息参数化是为了将复杂的轮廓图像信息抽象成易识别的参数信息。我们总结了人体图像参数化的主要途径,分析了各种方法的适用条件。根据人机交互系统的需求,我们重点介绍了星形骨骼模型的构建方法。
     动作识别是整个系统的关键部分,常用的方法有基于样本和基于学习两种。本文对于这两种方法均做了分析,并从中分别选择了星形距离法和支持向量机这两种具体实现。对于每一帧进行识别后,得到了姿态特征序列。我们使用了一种基于姿态缓存的机制,将姿态特征串与动作行为相匹配。
     本文的创新点在于对传统算法进行了适当的改进,并结合了较新的研究成果,提升了算法的性能,并完成了可实用的交互系统。具体包括：
     我们提出了一种两层背景去除方法,使用图像中的色度和梯度信息,通过两个步骤的背景去除,较好解决了前景分割时阴影对人体轮廓的干扰。这使得系统对于使用环境的要求大大下降,提高了系统的可用性和稳定性。
     在识别阶段,我们创新性地结合了星形骨骼模型和支持向量机。交互系统对于效率的要求很高,我们使用星形模型将人体动作信息转换成简单易识别的参数,并较完整地保留了动作的特征。将此模型信息作为支持向量机中的特征空间,取得了高效且良好的识别效果。
     本文采用改进后的算法,实现了一套完整的人体动作识别系统,达到了实时的交互性。并通过大量的测试和实验数据,验证了算法的性能和效果。同时,此系统框架具有良好的通用性和可扩展性,有进一步改进提升的潜力。
Human-Computer Interaction (HCI) is a research on how to communicate between human and computers, and how to provide information management, service and trans-action for users conveniently. Nowadays, computer-centered HCI has been changed into user-centered. Human action is an natural, direct and easy way to operate computers. No medium is needed if we use human body as input device when communication. User can control machines simply with proper predefined actions. So input with human actions is important in HCI. Because of the development of Computer Vision and Digital Image Processing, vision-based action recognition has become an active research in HCI system.
     This paper focuses on vision-based HCI with human action recognition, using single camera. We discuss techniques in Computer Vision, Digital Image Processing and Pattern Recognition. According to requirements of applications, we improve and integrate some methods.
     Firstly, this paper gives a full summarization of main steps and related domain on human action recognition system. Current common methods in action recognition are analyzed and discussed. According to the framework and process of common recognition system, we give a survey and overview on related methods clearly. Newcomers of this research field can get a general outline of action recognition from this paper.
     Then, we expatiate on related work from following aspects:
     Background Subtraction is the essential step of vision-based action recognition sys-tem. We summarize the basic methods of Background Subtraction, and introduce Back-ground Difference and Mixture of Gaussians in detail. Furthermore, we propose a Two-layered Background Subtraction for removing shadows.
     Parameterize of human body is the way to transfer complicated contour image to parameters which are easy to recognize. We summarize the main methods of parameter-ize, and analyze suitable cases for each method. We talk about Star Skeleton in detail according to the requirement of Human-Computer Interaction system.
     Action Recognition is the key module of this system. Common methods are divided into two classes:example-based and learning-based. We analyze both classes, and choose Star Distance and Support Vector Machine as instances to talk about respectively. After the system estimates posture from every frame, we use a Posture Buffer to match posture sequence with action behavior.
     The innovation point of this paper is improving some basic algorithms, and increasing the performance by integrating new researches. We also implement a practical interaction system. The detailed innovation as following:
     We propose a Two-layered Background Subtraction method, which is based on both chromaticity and gradient in image. This subtraction helps us remove shadows from human contour. This makes our system more robust to the environment, and improves the usability and stability.
     We use a novel method integrating Star Skeleton and Support Vector Machine in recognition step. Interaction system requires high performance. We use Star Skeleton to abstract simple parameters from human action. These parameters are easy to recognize and keep main features. Use this model as feature space of SVM can achieve a efficient and promising recognition result.
     With the improved algorithm, we implement a whole system of human action recog-nition and achieve real-time interaction. A large amount of test cases and experimental results show the promising performance and effectiveness. In addition, this system frame-work can be general used and easy to extend. It has potential to be improved.

引文

[1]Baozong Yuan, Qiuqi Ruan, Yanjiang Wang, Rujie Liu, Xiaofang Tang. A Concep-tual Model and Features of New Generation (Fourth Generation) Human-Computer Interactive Systems. ACTA ELECTRONICA SINICA.2003
    [2]Shihai Dong. Progress and Challenge of Human-Computer Interaction. JOURNAL OF COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS.2004
    [3]A. Jaimes, D. Gatica-Perez, N. Sebe, T.S. Huang. Human-centered computing:To-ward a human revolution. IEEE Computer.2007,40(5):30-34
    [4]Ronald Poppe. Vision-based human motion analysis:An overview. Comput Vis Image Underst.2007,108(1-2):4-18
    [5]Vladimir Pavlovic, Rajeev Sharma, Thomas S. Huang. Visual Interpretation of Hand Gestures for Human-Computer Interaction:A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence.1997,19(7):677-695
    [6]W. Zhao, R. Chellappa, PJ Phillips, A. Rosenfeld. Face recognition:A literature survey. Acm Computing Surveys (CSUR).2003,35(4):399-458
    [7]XU Guang-you, CAO Yuan-yuan. Action Recognition and Activity Understanding:A Review. Journal of Image and Graphics.2009,14(002):189-195
    [8]Thomas B. Moeslund, Adrian Hilton, Volker Kruger. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understand-ing. November 2006,104(2-3):90-126
    [9]DDR Extreme with Eyetoy. URL http://www.yuebar. com/tvgame/article_20767_1.html
    [10]Kick Ass Kung-Fu. URL http://www.kickasskungfu.net
    [11]M. Piccardi. Background subtraction techniques:a review. Systems, Man and Cy-bernetics,2004 IEEE International Conference on.2004, vol.4,3099-3104
    [12]C. R. Wren, A. Azarbayejani, T. Darrell, A. P. Pentland. Pfinder:real-time tracking of the human body. Pattern Analysis and Machine Intelligence, IEEE Transactions on.1997,19(7):780-785
    [13]C. Stauffer, W. E. L. Grimson. Adaptive background mixture models for real-time tracking. Computer Vision and Pattern Recognition,1999. IEEE Computer Society Conference on.1999, vol.2,252
    [14]Thanarat Horprasert, David Harwood, Larry S. Davis. A Statistical Approach for Real-time Robust Background Subtraction and Shadow Detection. ICCV Frame-Rate WS.1999
    [15]P. KaewTraKulPong, R. Bowden. An Improved Adaptive Background Mixture Model for Realtime Tracking with Shadow Detection. Proc.2nd European Workshop on Advanced Video Based Surveillance Systems, AVBS01, VIDEO BASED SURVEIL-LANCE SYSTEMS:Computer Vision and Distributed Processing.2001
    [16]Omar Javed, Khurram Shafique, Mubarak Shah. A Hierarchical Approach to Robust Background Subtraction using Color and Gradient Information. Workshop on Motion and Video Computing (MOTION'02).2002,22-27
    [17]Jwu-Sheng Hu, Tzung-Min Su. Robust background subtraction with shadow and highlight removal for indoor surveillance. EURASIP J Appl Signal Process.2007, 2007(1):108
    [18]Thomas B. Moeslund, Erik Granum. A Survey of Computer Vision-Based Hu-man Motion Capture. Computer Vision and Image Understanding:CVIU.2001, 81(3):231-268
    [19]Ali Erol, George Bebis, Mircea Nicolescu, Richard D. Boyle, Xander Twombly. Vision-based hand pose estimation:A review. Computer Vision and Image Un-derstanding. October 2007,108(1-2):52-73
    [20]Hironobu Fujiyoshi, Alan J. Lipton. Real-Time Human Motion Analysis By Image Skeletonization. Proceedings of IEEE WACV98.1998,15-21
    [21]Hsuan-Sheng Chen, Hua-Tsung Chen, Yi-Wen Chen, Suh-Yin Lee. Human action recognition using star skeleton. VSSN'06:Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks. New York, NY, USA:ACM, 2006,171-178
    [22]Duan-Yu Chen, Sheng-Wen Shih, H.-Y.M. Liao. Human Action Recognition Using 2-D Spatio-Temporal Templates. Proc. IEEE International Conference on Multimedia & Expo. Beijing, China,2007,667-670
    [23]Chi-Hung Chuang, Jun-Wei Hsieh, Luo-Wei Tsai, Kuo-Chin Fan. Human Action Recognition Using Star Templates and Delaunay Triangulation. Intelligent Informa-tion Hiding and Multimedia Signal Processing, International Conference on.2008, 0:179-182
    [24]Sungkuk Chun, Kwangjin Hong, Keechul Jung.3D Star Skeleton for Fast Human Posture Representation. World Academy of Science, Engineering and Technology. Venice, Italy
    [25]C. Schuldt, I. Laptev, B. Caputo. Recognizing human actions:a local SVM approach. Pattern Recognition,2004. ICPR 2004. Proceedings of the 17th International Con-ference on.2004, vol.3,32-36
    [26]J. Yamato, J. Ohya, K. Ishii. Recognizing human action in time-sequential images using hidden Markov model. Computer Vision and Pattern Recognition,1992. Pro-ceedings CVPR'92.,1992 IEEE Computer Society Conference on.1992,379-385
    [27]Xuedong Huang, Yasuo Ariki, Mervyn Jack. Hidden Markov Models for Speech Recognition. New York, NY, USA:Columbia University Press,1990
    [28]F. Niu, M. Abdel-Mottaleb. View-invariant human activity recognition based on shape and motion features. Multimedia Software Engineering,2004. Proceedings. IEEE Sixth International Symposium on.2004,546-556
    [29]Rong Zhang, Christian Vogler, Dimitris Metaxas. Human Gait Recognition. CVPRW '04:Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 1. Washington, DC, USA:IEEE Computer Society, 2004,18
    [30]Vili Kellokumpu, Matti Pietikainen, Janne Heikkila. Human Activity Recognition Using Sequences of Postures. MVA.2005,570-573
    [31]J Canny. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell.1986,8(6):679-698
    [32]S.X. Ju, M.J. Black, Y. Yacoob. Cardboard people:a parameterized model of artic-ulated image motion. Automatic Face and Gesture Recognition,1996., Proceedings of the Second International Conference on.1996,38-44
    [33]Ismail Haritaoglu, David Harwood, Larry S. Davis. W4s:A realtime system detect-ing and tracking people in 2.5D. Computer Vision (ECCV'98), Proceedings of the European Conference on. Springer Verlag,1998, vol.1,877-892
    [34]N. Howe, M. Leventon, W. Freeman. Bayesian reconstruction of 3d human motion from single-camera video. Neural Information Processing Systems. Citeseer,1999, vol.1
    [35]Yu Huang, T.S. Huang. Model-based human body tracking. Pattern Recognition, 2002. Proceedings.16th International Conference on.2002, vol.1,552-555 vol.1
    [36]Tat-Jen Cham, J.M. Rehg. A multiple hypothesis approach to figure tracking. Com-puter Vision and Pattern Recognition,1999. IEEE Computer Society Conference on. 1999, vol.2,-244 Vol.2
    [37]Ankur Agarwal, Bill Triggs. Tracking Articulated Motion Using a Mixture of Au-toregressive Models. ECCV.2004,54-65
    [38]L. Kakadiaris, D. Metaxas. Model-based estimation of 3D human motion. Pattern Analysis and Machine Intelligence, IEEE Transactions on. Dec 2000,22(12):1453-1459
    [39]L. Sigal, M. Isard, B.H. Sigelman, M.J. Black. Attractive people:Assembling loose-limbed models using non-parametric belief propagation. Advances in Neural Infor-mation Processing System.2004,16
    [40]C. Bregler, J. Malik, K. Pullen. Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision.2004,56(3):179-194
    [41]M. Brand. Shadow puppetry. International Conference on Computer Vision. Corfu, Greece, September,1999, vol.2,1237
    [42]R. Rosales, S. Sclaroff. Inferring body pose without tracking body parts. IEEE COM-PUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. IEEE Computer Society; 1999,2000, vol.2
    [43]N.R. Howe. Silhouette lookup for automatic pose tracking. IEEE Workshop on Articulated and Nonrigid Motion. Citeseer,2004, vol.1,3
    [44]S. Belongie, J. Malik, J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002:509-522
    [45]G. Mori, J. Malik. Recovering 3d human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence.2006,28(7):1052-1062
    [46]T. Drummond, R. Cipolla. Real-time tracking of highly articulated structures in the presence of noisy measurements. IEEE International Conference on Computer Vision. Citeseer,2001
    [47]A. Bottino, A. Laurentini. A silhouette based technique for the reconstruction of human movement. Computer Vision and Image Understanding.2001,83(1):79-95
    [48]G. Cheung, S. Baker, T. Kanade. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. IEEE computer society conference on computer vision and pattern recognition. Citeseer,2003, vol.1
    [49]D. Ramanan, D. Forsyth. Finding and tracking people from the bottom up. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Citeseer, 2003, vol.2
    [50]J. Deutscher, I. Reid. Articulated body motion capture by stochastic search. Inter-national Journal of Computer Vision.2005,61 (2):185-205
    [51]N.R. Howe. Flow lookup and biological motion perception. International Conference on Image Processing.2005, vol.1,3
    [52]H. Sidenbladh, M.J. Black. Learning the statistics of people in images and video. International Journal of Computer Vision.2003,54(1):183-209
    [53]V.N. Vapnik. The nature of statistical learning theory. Springer Verlag,2000
    [54]S. Haykin. Neural networks:a comprehensive foundation, Second Edition. Prentice Hall,1999
    [55]Pattern Classification by an SVM. URL http://www-kairo.csce.kyushu-u.ac. jp/-norikazu/research.en.html
    [56]J. Grahn, H. Kjellstrom. Using SVM for efficient detection of human motion. Visual Surveillance and Performance Evaluation of Tracking and Surveillance,2005.2nd Joint IEEE International Workshop on.2005,231-238
    [57]LIBSVM-A Library for Support Vector Machines. URL http://www.csie.ntu. edu.tw/-cjlin/libsvm/index.html

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700