基于人脸识别的人机交互探索与研究

英文题名：Exploratory Research on Face Recognition Based Human-Computer Interaction
作者：贡国栋
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：人机交互 ; 普适计算 ; 计算机视觉 ; 人脸识别
英文关键词：HCI ; ubiquitous computing ; computer vision ; face detection
学位年度：2010
导师：余雪丽
学科代码：081201
学位授予单位：太原理工大学
论文提交日期：2010-05-01

摘要

计算机的发明给人类社会带来了巨大的影响。随着硬件方面计算速度的提高与存储容量的扩大,各种应用软件与管理系统不断地被研发出来,人与计算机的交互方式也发生着深层次的变革。图形用户界面在极大地方便普通用户使用计算机的同时,也带了诸多的限制,使计算机仍然难以真正融入人们的工作和生活。人们希望计算机能听、会看、可以说话,甚至比人做得更好,并且能够进行实时处理,一种全新的交互方式成为当前计算机技术发展的迫切需要。计算机系统的拟人化,计算机的微型化、随身化和嵌入化,将是计算机两个重要的应用趋势,而人机交互技术是其中的瓶颈,以人为中心、自然、高效将是发展新一代人机交互主要的努力方向。
     普适计算要实现的目标之一,就是让计算机变得不可见、不需要提供有意识的操作,同时能够向人们提供无所不在的计算和信息服务。在对基于计算机视觉、语音识别、手势输入、感觉反馈等新的交互技术的研究中,人脸识别因其在身份验证、档案管理、视频会议等方面的巨大应用前景而越来越成为当前人机交互领域的研究热点。而视线跟踪技术由于其可代替键盘、鼠标实现输入和移动的功能,对行动不便的人群和飞行员等有较大的吸引力,也吸引了心理学家、交互技术专家的广泛关注。
     本文作者查阅了近年来国内外大量关于人机交互和计算机视觉的学术论文及文献,对人机交互的基本设计方法和普适计算的框架进行了探讨,阐述了人脸检测与识别的一般过程和方法,并且对其具体实现作了较为深入的讨论,提出一种用于扩展第三方软件功能的框架协议,将基于计算机视觉的人机交互无缝嵌入到现有系统中。本文的研究工作主要包括以下几个方面:
     (1)概括了本课题的研究意义和应用前景,回顾了人机交互、计算机视觉的国内外研究现状,并总结了今后的发展趋势;
     (2)对普适计算的交互框架作了简要分析,讨论了交互过程的两种设计方法,并结合实例讨论了在实际项目中的运用,对比了两种方法的不同设计过程;
     (3)重点阐述了人脸检测与识别的一般处理方法,对当前较为流行的开源计算机视觉库的实现算法进行了详细讨论,并借此完成人脸识别系统的开发;
     (4)提出了将基于计算机视觉的人机交互实现为一个框架协议程序,为第三方应用软件提供功能上的扩展,只需定义交互行为的脚本描述文件,即可实现对由摄像头捕捉到的眼部动作的响应。
The invention of the computer to the human society has a huge influence. With the increase in hardware computing speed and storage capacity expansion, a variety of application software and management system is constantly developed and the manner of human-computer interaction is also undergoing profound changes. Graphical user interface greatly facilitate the use of ordinary computer users, but it still has so many restrictions that the computer is difficult to really integrate into people's work and life. People want the computer is able to listen, to see, to speak, or do these work even better than people, and can carryout real-time processing. A new interactive way becomes an urgent need for the development of computer technology. Personification of computer systems, computer miniaturization, portable technology and embedded technology will be two important trends for using computer, but human-computer interaction technology is one of the bottlenecks. Human-centered, natural manner will be the main direction for the new generation of efficient human-computer interaction.
     One goal of the pervasive computing is to let the computer become invisible, with not providing a sense of the operation, while able to provide people with ubiquitous computing and information services. In the research of computer vision, speech recognition, gesture input and sensory feedback of the new interactive technology, the face recognition because of it's has great future in the area of authentication, file management and video conferencing, is increasingly becoming the focus in the field of research of human computer interaction. The gaze tracking can be replaced has attracted psychologists, interactive technical experts attention, because it can take the place of keyboard and mouse to help disabled people and pilots to achieve the function of moving and input.
     The author researched a lot of access to human-computer interaction and computer vision on academic papers and literature at home and abroad in recent years, discussed the basic design of the human-computer interaction methods and a framework for pervasive computing, introduced the face detection and recognition of the general process and methods, and made its concrete realization of a more in-depth discussion, then proposed a framework protocol used to extend the functionality of third-party software to embed human-computer interaction seamlessly into the existing system based on computer vision. The main work includes the following:
     (1) Summarizes the significance of this research topic and prospects, reviewed human-computer interaction, computer vision research status, and summarizes the future development trends;
     (2) Interactive framework on pervasive computing are briefly analyzed and discussed the interaction of the two design methods, then discussed with examples in practical projects use and compared the two different methods of design process;
     (3) Based on face detection and recognition of the most popular currently open source computer vision library, the algorithm of detection and recognition is discussed in detail, and completed the development of face recognition systems;
     (4) Proposed a framework protocol to achieve human-computer interaction based on computer vision, which can be used by the third-party applications to provide functional extensions. Then the system is able to response to eye movements that were captured by the camera only by defining interaction description in the script file additionally.

引文

[1] Hewett T., Baecker R., Card S., etc. ACM SIGGHI Curricula for Human-Computer Interaction[J]. ACM SIGGHI, 1996.
    [2] Butler Lampson. The Uses of Computers: What is Past is Merely Prologue[R].北京:北京大学百周年纪念讲堂, 2008.
    [3]中国互联网络信息中心(CNNIC).中国互联网络发展状况统计报告[EB/OL]. http://www.cnnic.cn/html/Dir/2010/01/15/5767.htm, 2010-1-15.
    [4] David J. Farber. Information Technology for the Twenty-First Century (IT2)[EB/OL]. http://cyber.law.harvard.edu/icann/singapore-0399/archive/farber_pres/, 1999-3-7.
    [5] Weiser M.. Computer for the 21st century[J]. Scientific American, 1991, 265(3): 94-104.
    [6] Anoop K. Sinha. Usability of MultiPoint[EB/OL]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.1801&rep=rep1&type=pdf, 2010-3-28.
    [7] James A. Landay, Brad A. Myers. Sketching interfaces: Toward more human interface design[J]. IEEE Computer, 2001, 34(3): 56-64.
    [8] LI Yang, GUAN ZhiWei, DAI GuoZhong, etc. A Context-Aware Infrastructure for Supporting Applications with Pen-Based Interaction[J]. Journal of Computer Science & Technology, 2003, 18(3): 343-353.
    [9] Wang Jian. Human-computer interaction research and practice in China[J]. Interactions, 2003, 10(2):88-96.
    [10] Stephen Brewster, Lorna M. Brown. Tactons: structured tactile messages for non-visual information display[C]. Proceedings of the fifth conference on Australasian user interface(28). Darlinghurst, Australia, Australia: Australian Computer Society, Inc, 2004: 15-23.
    [11] Lorna M. Brown, Stephen A. Brewster, Helen C. Purchase. A First Investigation into the Effectiveness of Tactons[C]. World Haptics Conference 2005. Washington, DC, USA: IEEE Computer Society, 2005: 167-176.
    [12] V. G. Chouvardas, A. N. Miliou, M. K. Hatalis. Tactile displays: Overview and recent advances[J]. Displays, 2008, 29(3): 185-194.
    [13] TSUKADA KOJI, YASUMURA MICHIAKI. Active Belt: Belt-type Wearable TactileDisplay for Directional Navigation[J]. Transactions of Information Processing Society of Japan, 2003, 44(11): 2649-2658.
    [14] Watanabe Toshio, Fukui Shigehisa. A method for controlling tactile sensation of surface roughness using ultrasonic vibration[C]. Proceedings IEEE International Conference on Robotics and Automation(1). New York, NY, USA: ACM Press, 1995: 1134-1139.
    [15] Yasushi Ikei, Kazufumi Wakamatsu, Shuichi Fukuda. Vibratory Tactile Display of Image-Based Textures[J]. IEEE Computer Graphics and Applications, 1997, 17(6): 53-61.
    [16]吴涓,宋爱国,李建清.图像的力/触觉表达技术研究综述[J].计算机应用研究, 2007, 24(5): 1-3.
    [17] SensAble Technologies, Inc. PHANTOM Omni Technical Specification[EB/OL]. http://www.sensable.com/haptic-phantom-omni.htm#techspecs, 2010-3-28.
    [18] Ivan Poupyrev, Shigeaki Maruyama. Tactile interfaces for small touch screens[C]. Proceedings of the 16th annual ACM symposium on User interface software and technology. New York, NY, USA: ACM Press, 2003: 217-220.
    [19] Andy Cockburn, Stephen Brewster. Multimodal feedback for the acquisition of small targets[J]. Ergonomics, 2005, 48(9): 1129-1150.
    [20] Robert W. Lindeman, John L. Sibert, Erick Mendez-Mendez. Effectiveness of directional vibrotactile cuing on a building-clearing task[C]. Proceedings of the SIGCHI conference on Human factors in computing systems. New York, NY, USA: ACM Press, 2005: 271-280.
    [21] David K. McGookin, Stephen A. Brewster. MultiVis: improving access to visualisations for visually impaired people[C]. CHI '06 extended abstracts on Human factors in computing systems. New York, NY, USA: ACM Press, 2006: 267-270.
    [22] Ying Wu, John Y. Lin, Thomas S. Huang. Capturing Natural Hand Articulation[C]. Eighth International Conference on Computer Vision(2). New York, NY, USA: ACM Press, 2001: 426-432.
    [23] James M. Rehg, Takeo Kanade. DigitEyes: Vision-Based Human Hand Tracking[R]. Pittsburgh, PA, USA: Carnegie Mellon University, 1993.
    [24] Gordon Kurtenbach, George Fitzmaurice, Thomas Baudel, etc. The design of a GUI paradigm based on tablets, two-hands, and transparency[C]. Proceedings of the SIGCHIconference on Human factors in computing systems. New York, NY, USA: ACM Press, 1997: 35-42.
    [25] Robert C. Zeleznik, Andrew S. Forsberg, Paul S. Strauss. Two pointer input for 3D interaction[C]. Proceedings of the 1997 symposium on Interactive 3D graphics. New York, NY, USA: ACM Press, 1997: 115-120.
    [26] Lawrence D. Cutler, Bernd Fr?hlich, Pat Hanrahan. Two-handed direct manipulation on the responsive workbench[C]. Proceedings of the 1997 symposium on Interactive 3D graphics. New York, NY, USA: ACM Press, 1997: 107-114.
    [27] Kurt Partridge, Saurav Chatterjee, Vibha Sazawal, etc. TiltType: accelerometer-supported text entry for very small devices[C]. Proceedings of the 15th annual ACM symposium on User interface software and technology. New York, NY, USA: ACM Press, 2002, 201-204.
    [28] Yuanchun Shi, Weikai Xie, Guangyou Xu, etc. The Smart Classroom: Merging Technologies for Seamless Tele-education[J]. IEEE Pervasive Computing, 2003, 2(2): 47-55.
    [29] David Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information[M]. San Francisco: W. H. Freeman, 1982.
    [30] Horn, Berthold K.P.. Image Intensity Understanding[EB/OL]. http://hdl.handle.net/1721.1/6236, 2004-10-04.
    [31] Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition[J]. IEEE Transactions on Information Theory, 1975, 21(1): 32-40.
    [32] Yizong Cheng. Mean Shift, Mode Seeking, and Clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8): 790-799.
    [33] Rob Fergus, Pietro Perona, Andrew Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning[C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition(2). New York, NY, USA: ACM Press, 2003: 264.
    [34] Roel Vertegaal. Attentive user interfaces[J]. Communications of the ACM, 2003, 46(3): 30-33.
    [35] SR Research Ltd. EyeLink II Key Features[EB/OL].http://www.sr-research.com/EL_II.html, 2010-03-28.
    [36] Ekman P., Friesen W.V., Hager J.C.. Facial Action Coding System[EB/OL]. http://face-and-emotion.com/dataface/facs/description.jsp, 2010-3-28.
    [37] Stuart K. Card, Allen Newell, Thomas P. Moran. The Psychology of Human-Computer Interaction[M]. Hillsdale, NJ, USA: L. Erlbaum Associates Inc., 1983.
    [38] Allen Newell. Human Problem Soving [M]. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1972.
    [39] Bonnie E. John, David E. Kieras. The GOMS family of user interface analysis techniques: comparison and contrast[J]. ACM Transactions on Computer-Human Interaction, 1996, 3(4): 320-351.
    [40] ISO 13407-1999, Human-centred design processes for interactive systems[S]. Switzerland: International Organization for Standardization, 1999.
    [41] John M. Carroll Scenarios and Design Cognition[C]. Proceedings of the 10th Anniversary IEEE Joint International Conference on Requirements Engineering. Washington, DC, USA: IEEE Computer Society, 2002: 3-5.
    [42]李维.认知心理学研究[M].杭州:浙江人民出版社, 1998.
    [43] Michael W. Eysenck. Principles of Cognitive Psychology[M]. Hove, East Sussex, UK: Psychology Press Ltd., 2001.
    [44] Margaret W.M., Hugh J.F.. Seatation and Perception[M]. Massachusetts, USA: Allyn & Bacon, 1996
    [45] Daniel Kahneman. Attention and Effort[M]. New York, USA: Prentice-Hall, 1973.
    [46]陈文广,董士海,岳玮宁,等.手持移动计算中的人机交互技术研究[J].计算机应用, 2005, 25(10): 2219-2223.
    [47] S Shan, W Gao, B Cao, D Zhao. Illumination Normalization for Robust Face Recognition against Varying Lighting Conditions[J]. IEEE International Workshop on Analysis and Modeling of Faces and Gestures. 2003, 10: 157-164.
    [48] Yang M H, Ahuja N. Detecting human faces in color images[C]. Proc. IEEE Conf. on Image Processing.Washington, DC, USA: IEEE Computer Society, 1998.
    [49]Wei G, Sethi I K. Face detection for image annotation[J]. Pattern Recognition Letters. 1999, 20: 1313-1321.
    [50] Yuille A L. Detection Templates for Face Recognition[J]. Cognitive Neuroscience. 1991,3(1): 59-70.
    [51] I Craw, H Ellis, J Lishman. Automatic extraction of face features[J]. Pattern Recognition Letter. 1987, 5(2): 183-187.
    [52] Sung K, Poggio T. Example-based learning for view based human face detection[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence. 1998, 20(1):39-51.
    [53] Linehart R, Maydt J. An extended set of haar-like features for rapid object detection[J]. IEEE ICIP 2002, 2002, 1: 900-903.
    [54] M Turk, A Pentland. Eigenfaces for Recognition[J]. Journal of cognitive ncuroscience. 1991, 3(1): 71-86.
    [55] R J Baron. Mechanisms of Human Facial Recogntion[J]. Int J Man-Machine Studies. 1981, 15(10): 137-178.
    [56] L Wiskott, J M Fellous, N Kruger, etc. Face Recognidon by Elastic Bunch Graph Matching[J]. IEEE Trans Pattern Analysis and Machine Intelligence. 1997, 19(7): 775-779.
    [57] Vapnik V,张学工(译).统计学习理论的本质[M].北京:清华大学出版社, 2004.
    [58] Burges C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery, 1998, 2(2): 1-43.
    [59]Ming-Hsuan Yang, David J. Kriegman, Narendra Ahuja. Detecting Faces in Images: A Survey[J]. IEEE Transactions On Pattern Analysis And Machine Intelligence, 2002, 24(1).
    [60] Paul Viola, M.Jones, Fast and Robust Classification using Asymmetric, NIPS-2001.
    [61] M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio. Pedestrian detection using wavelet templates[J]. In Computer Vzszon and Pattern Recognztzon, 1997: 193-199.
    [62] Rainer Lienhart, Alexander Kuranov, Vadim Pisarevsky. Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection[C]. MRL Technical Report, 2002.
    [63]刘晓克,孙燮华,周永霞.基于新Haar-like特征的多角度人脸检测[J].计算机工程, 2009, 35(19).
    [64] Messom, C.H., Barczak, A.L.C., Fast and Efficient Rotated Haar-like Features Using Rotated Integral Images[J]. Australian Conference on Robotics and Automation, 2006: 1-6.
    [65] Yoav Freund, Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting[J]. In Computational Learning Theory: Eurocolt' 95, 1995: 23-37.
    [66] Robert E. Schapire, Yoav Freund, Peter Bartlett, etc. Boosting the margin: A new explanation for the effectiveness of voting methods[J]. In Proceedings of the Fourteenth International Conference on Machine Learning, 1997.
    [67] R. E. Schapire. The Boosting Approach to Machine Learning An Overview[J]. MSRI Workshop on Nonlinear Estimation and Classification, 2002.
    [68] Y. Freund, R. E. Schapire. Experiments with a New Boosting Algorithm[J]. Proceedings of the 13 th International Conference on Machine Learning, 1996: 148-156.
    [69] Margaret W.M., Hugh J.F.. Seatation and Perception[M]. Massachusetts: Allyn & Bacon, 1996.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700