基于视觉的大范围头部姿态跟踪关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于视觉的大范围头部姿态跟踪关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Key Techniques of Vision-based Large Head Pose Tracking
作者：赵刚强
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：人机交互 ; 头部姿态跟踪 ; 注册算法 ; 局部描述符 ; 立体匹配 ; 头部运动
英文关键词：Human computer interaction ; head pose tracking ; registration algorithm ; local descriptors ; dense stereo matching ; head motion
学位年度：2009
导师：陈根才 ; 陈岭
学科代码：081202
学位授予单位：浙江大学
论文提交日期：2009-10-01

摘要

三维头部姿态跟踪(3D head pose tracking)是计算机视觉和人机交互领域中的重要问题,也是近年来越来越引起重视的研究方向,其主要目的是通过对输入图像序列的分析确定头部在三维空间中的姿态参数。三维头部姿态跟踪技术在人机交互、智能监控、视频压缩编码、人脸识别、表情识别、疲劳检测、基于身体控制的游戏和娱乐等领域有广泛的应用前景。
     目前常用的头部姿态估计方法可以分成两大类:基于统计学习的方法和基于注册跟踪的方法。基于统计学习的方法假设头部姿态参数和人脸的某些特征之间存在一定的对应关系,并通过对大量具有不同姿态的样本图像进行训练来确定这种关系。此类方法容易受到特征定义的影响,并且往往要对姿态参数进行插值操作,因此结果不够精确。基于注册跟踪的方法通常假设头部为刚性物体,通过帧与帧之间的特征点跟踪计算姿态参数。所选择的特征在不同的实现中有很大的差异。一种方法是选择嘴角、鼻尖和眼角等显著特征点进行跟踪,当所选的特征点被遮挡时会影响跟踪结果。另一种方法是在跟踪过程中动态选择特征点,当一些特征点丢失后自动进行补充,此类方法有更鲁棒的表现。总体来说,基于注册跟踪的方法易于实现,同时具有较高的跟踪精度。
     已有的头部姿态跟踪算法大都假设被跟踪对象没有身体运动或者很小的身体运动,如用户坐在椅子上的情况。人们在日常生活中很多时候都是通过头部姿态来表达自己的注意力方向、态度和心理感受的,而在这些活动中,人们可能是坐在固定位置,也有可能是在身体运动过程中的。这里我们定义身体运动情况下的头部姿态跟踪为大范围头部姿态跟踪。相对于传统的小范围头部姿态跟踪技术,大范围头部姿态跟踪技术可以更方便的应用在人机交互、智能监控和行为识别等多个领域。
     本文选用基于注册的方法来解决大范围头部姿态跟踪问题,但是当人体大范围运动时,过大的姿态参数变化会降低注册算法的精度,逐帧跟踪的方法在长时间跟踪后会导致一定的误差累计,并且为了进行三维姿态参数计算,还需要提供对应头部特征点的深度信息。因此,本文提出基于局部特征描述符的注册算法和视角表观模型相结合的跟踪方法,该方法将整个姿态跟踪过程分为三个主要部分:一是获取视频信息和对应的深度信息,深度信息既可以使用立体摄像机获得也可以通过立体匹配技术获得;二是通过基于局部描述符的注册算法计算两帧之间的姿态参数变化;三是使用外观模型消除跟踪过程中的误差累计。与以前的工作相比,本文主要有以下几个方面的贡献:
     1.提出一种基于尺度不变特征变换(Scale-Invariant Feature Transform,以下简称SIFT)描述符的注册算法。首先在两帧灰度图像中找到匹配的SIFT特征点,然后通过立体摄像机或者立体匹配技术获得这些匹配点的深度信息,为了克服错误匹配点的影响,最后使用基于随机抽样一致性(RANSAC)的运动估计方法来计算头部运动。基于SIFT特征匹配的注册算法具有较高的跟踪精度,当两帧图像间发生一定尺度变化时仍然可以完成跟踪,是一种适合大范围头部姿态跟踪的注册算法。该算法是第一个针对大范围头部姿态跟踪提出的注册算法,在领域内产生了一定的影响,我们发表的介绍该算法的文章已被多个国际同行引用。
     2.提出一种紧凑的特征描述符KPB-SIFT(Kernel Projection Based SIFT,以下简称KPB-SIFT)。首先使用SIFT检测算法计算特征点的位置、尺度和主方向,然后通过对特征点邻域内的有向梯度信息进行核映射的方式获得低维描述符。与SIFT相比,KPB-SIFT可显著提高描述符的匹配速度,并且具有较强的区分度,在发生光照变化和几何形变等情况下都有鲁棒的表现。
     3.提出一种视角表观模型。该模型通过多次注册的方法消除逐帧跟踪时的误差累计,其原理就是当前帧除了和它的前一帧进行注册外,还可以和一两个关键帧进行多次注册以减少误差累积。具体来说,就是从输入序列中选择一些关键帧组成描述头部的表观模型,每个关键帧都被附加上对应的姿态参数,除此外还对每个关键帧精确提取头部区域作为头部视角,当被跟踪对象大范围运动时,只要当前帧的头部视角与模型中的关键帧头部视角接近时,当前帧就与关键帧进行注册。多次注册的结果通过卡尔曼滤波器(Kalman filter)进行平滑已获得最终的姿态参数。视角表观模型不仅可以减少跟踪过程中的误差累计,在头部进出摄像机视角、头部离摄像机较远等情况下,视角表观模型还可以用来快速恢复头部姿态参数。
     4.提出一种适合稠密立体匹配的快速局部特征描述符(Speeded-Up LocalDescriptor,以下简称SULD),用来作为立体匹配过程中的对应点查找方法。为了生成局部描述符,首先使用哈尔(Haar)函数对图像进行滤波,其次对滤波响应图进行多次高斯平滑,然后计算采样点并获得采样向量,最后对采样向量进行归一化并生成描述符。通过使用Haar函数响应信息和紧凑的描述符形式,SULD方法在描述符生成阶段和匹配阶段都可以快速的进行计算。使用SULD描述符作为相似度评价方法,可以解决人脸等弱纹理图像的立体匹配问题,进而生成对应的深度信息,为基于单目摄像机的头部姿态跟踪提供深度约束。头部深度信息还可以在人机交互、表情识别、游戏和娱乐等领域得到广泛的应用。
     在整个研究过程中,还实现了一个集视频采集、深度获取、姿态计算和结果评测与一体的头部姿态跟踪原型系统-HPObserver。HPObserver为验证各关键技术和后续研究工作提供了一个完整方便的测试平台。
     使用多组头部运动序列进行的实验表明,提出的方法能完成对头部运动的跟踪,即使在人体大范围运动、头部进出摄像机视角、人脸部分遮挡、脸部表情明显变化等情况下都能鲁棒的完成跟踪。在本文最后,分析了提出方法的主要问题并展望了未来的研究方向。
3D head pose tracking is an important research problem in the field of computer vision and human computer interaction. Recently, it becomes to be a more attractive research direction. The principal objective of head pose tracking is estimating the 3D pose parameters by analyzing the input image sequence. The head pose information can be widely employed in human computer interaction, intelligent surveillance, video compressed-coding, face recognition, expression recognition, fatigue detection, body-controlled games, entertainments and etc.
     Most existing head pose tracking methods can be classified into two categories: statistics learning-based methods and registration-based methods. Statistics learning -based methods assume there exists a relationship between some facial features and 3D head poses, and it employs a large number of training images to determine this relationship. These methods are easily affected by the facial features selecting approaches, and they usually need to interpolate the recovered pose parameters, so their results are not very accurate. Registration-based methods commonly assume the head is a rigid object and estimate the pose parameters by employing the feature correspondences between two frames. The selected features might vary from one implementation to another. One approach is selecting distinct features such as mouth corners, nose tip, eye corners and etc. The tracking results of this approach will be less precise when the selected features are occluded. The other approach is selecting facial features dynamically. This approach can automatically select new features In the tracking process when some features are lost, and has more robust results. Generally speaking, registration-based methods are easily to be implemented and have more precise result. However, when the head moves in a large range, it is difficult to register two frames if there has large pose change between them. Besides that, there has a drift accumulation after a long time frame-by-frame tracking. In order to estimate 3D head pose parameters, the registration method also requires the corresponding 3D information of facial features.
     These existing head pose tracking methods always assume the subject has no body movement or only small body movement, e.g. the subject sitting in a chair. However, when the human beings express interest, attitude and feeling by using head pose in their daily life, they either sits in a place, or moves in a large range. In this thesis, large head pose tracking is defined as the head pose tracking while there has body movement. As compared with the common head movement tracking, the large head movement tracking technique can be more widely applied in human computer interaction, intelignt surviliance, action recognition and etc.
     This thesis deals with the problem of large head movement tracking by using the local descriptor to detect and match facial features. The whole process includes three steps. First, get the image information and corresponding depth map. The depth map is obtained either from stereo vision camera or by 3D reconstructing techniques. Second, register two frames and estimate the head pose change. Third, reduce the drift accumulation in the frame-by-frame tracking procedure by employing appearance model, which is also helpful for recovering the pose tracking automatically. Compared with existed work, our main contributions can be stated as follows:
     1. We propose a novel Scale Invariant Feature Transform (SIFT) based registration algorithm. Salient SIFT features are first detected and tracked between two images, and then the 3D points corresponding to these features are obtained from a stereo camera or 3D reconstructed information. With these 3D points, a registration algorithm in a RANSAC framework is employed to detect the outliers and estimate the head pose. By using SIFT-based algorithm, two frames can be accurately registered even when their scale are also changed. Thus, the proposed SIFT-based registration algorithm is appropriate for large head movement tracking. The proposed SIFT-based registration algorithm is the first registration algorithm designed for large head movement tracking, and the related paper has been referenced by other researchers.
     2. A new compact feature descriptor, called Kernel Projection Based SIFT (KPB-SIFT), was proposed. It detects the interest feature points using the SIFT feature detector firstly. And then apply kernel projection techniques to orientation gradient information in the feature point's neighborhood. KPB-SIFT is significantly faster in descriptor's matching stage, and shows superior advantages in terms of distinctiveness, invariance to scale, and tolerance of geometric distortions.
     3. In order to reduce the drift accumulation during tracking in large range, we propose a view-based appearance model, which can select key frames online when the head undergoes different motions. These key frames are annotated with their poses and head regions both, and collectively represent the appearances of the subject viewed from these estimated poses. To bound drift, our tracker registers current frame against its previous frame using the SIFT-based registration algorithm firstly, and then select one key frame as the base frame if its view of head is similar enough to that of current frame, and then registers current frame against the base frame using the SIFT-based registration algorithm again, finally the pose of current frame is obtained by merging results of two registrations using Kalman filter.
     4. We present a novel local image descriptor for dense wide-baseline matching purposes, coined SULD (Speeded-Up Local Descriptor). The building process of SULD is divided into four stages. First, convolve input image using Haar wavelet filter. Second, smooth response maps with Gaussian kernels. Third, calculate sample locations and obtain the corresponding sample vectors from smoothed response maps. Finally, normalize sample vectors and concatenate the SULD descriptors. SULD can be computed and matched much faster by employing the efficient Haar wavelet filters and integral image techniques. SULD can be used to densely matching of texture-less face image pairs and the produced depth information will be provided for monocular camera based head pose tracking. The face depth information is also widely employed in human computer interaction, expression recognition, body-controlled games and entertainments.
     During the research period, we have designed and implemented a head pose tracking demo system named as HPObserver. HPObserver supports video collection, depth production, pose estimation and performance evaluation. HPObserver is helpful for both ongoing research and future works.
     In order to evaluate the performance of the proposed approach, we do experiments on dozen of image sequences. The extensive experiments shows that, the proposed approach can obtain a robust result even in the case of the large body movement, the subject returns to the visual field of camera after abrupt leaving, the subject's facial expressions varies and an occlusion happens. We analyze the existing problems, and discuss the future directions in the end.

引文

[1] T. Darrell, G. Gordon, M. Harville, J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1998, 1:601-608

    [2] M. S. Nixon, J. N. Carter, D. Cunado, P. S. Huang, S. V. Stevenage. Automatic gait recognition. In: IEEE Colloquium on Motion Analysis and Tracking (Ref.No. 1999/103), 1999, 1:3/1-3/6
    [3] P.S. Huang. Automatic gait recognition via statistical approaches for extended template features. IEEE Transactions on Systems, Man and Cybernetics, 2001,Part B,31(5):818-824
    [4] V. I. Pavlovic, R. Sharma, T. S. Huang. Visual interpretation of hand gestures for human computer interaction: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7):677-695
    [5] B.W. Miners, O.A. Basir, M.S. Kamel. Understanding hand gestures using approximate graph matching. IEEE Transactions on Systems, Man and Cybernetics, 2005, Part A, 35(2):239-248
    [6] P.R.G. Harding, T.J. Ellis. Recognizing hand gesture using Fourier descriptors.In: Proceedings of the 17th International Conference on Pattern Recognition,2004, 3:286-289
    [7] M. Black, Y. Yacoob. Recognizing facial expressions in image sequences using local parameterized models of image motion. International Journal of Computer Vision, 1997, 25(1):23-48
    [8] L. Ma, K. Khorasani. Facial expression recognition using constructive feedforward neural networks. IEEE Transactions on Systems, Man and Cybernetics, 2004, Part B, 34(3): 1588-1595
    [9] X. Zhou, X. Huang, B. Xu, Y. Wang. Real-time facial expression recognition based on boosted embedded Hidden Markov Model. In: Proceedings of Third International Conference on Image and Graphics, 2004, 1:290-293
    [10] H. Hienz, K. Grobel, G. Offner. Real-time hand-arm movement analysis using a single video camera. In: Proceedings of 2nd International Conference on Automatic Face and Gesture Recognition, 1996,1:323-327

    [11] W. H. Wollaston. On the apparent direction of eyes in a portrait. Phil. Trans.Royal Society of London, 1824, 114:247-256

    [12] D.O. Gorodnichy, S. Malik, G. Roth. Affordable 3D face tracking using projective vision. In: Proceedings of International Conference on Vision Interfaces, 2002, 1:383-390
    [13] R. Ruddarraju, A. Haro, I. EssaFast. Multiple camera head pose tracking. In:Proceedings of International Conference on Vision Interfaces, 2003, 1:1-7
    [14] R. Yang, Z. Zhang. Model-based head pose tracking with stereo vision. In:Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition,2002,1:255-260
    [15]D.G.Lowe.Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision,2004,60(2):91-110
    [16]L.-P.Morency,A.Rahimi,T.Darrell.Adaptive view-based appearance models.In:Proceedings of Computer Vision and Pattern Recognition(CVPR),2003,1:803-812
    [17]G.Y.Liang.3D model based face pose tracking from a monocular image sequence.PhD.Thesis of Peking University,2005
    (梁国远.基于三维模型的单目图像序列人脸姿态跟踪.北京大学博士学位论文,2005)
    [18]E.Murphy-Chutorian,M.M.Trivedi.Head Pose Estimation in Computer Vision:A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(4):607-626
    [19]D.Beymer.Face recognition under varying pose.In:Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,1994,1:756-761
    [20]L.P.Morency,T.Darrell.Stereo tracking using ICP and normal flow constraint In:Proceedings of IEEE International Conference on Pattern Recognition,2002,1:367-372
    [21]J.Sherrah,S.Gong,E.J.Ong.Understanding pose discrimination in similarity space.In:Proceedings of British Machine Vision Conference,1999,1:523-532
    [22]J.Sherrah,S.Gong,E.J.Ong.Face distributions in similarity space under varyinghead pose.Image and Vision Computing,2001,19(12):807-819
    [23]J.Huang,X.Shao,H.Wechsler.Face pose discrimination using support vector machines(SVM).In:Proceedings of IEEE International Conference on Pattern Recognition,1998,1:154-156
    [24]Z.Zhang,Y.Hu,M.Liu,T.Huang.Head pose estimation in seminar room using multi view face detectors.In:Proceedings of International Workshop Classification of Events Activities and Relationships,2007,4122:299-304
    [25]H.Rowley,S.Baluja,T.Kanade.Rotation invariant neural network-based face detection.In:Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,1998,1:38-44
    [26]M.Jones,P.Viola.Fast multi-view face detection.Mitsubishi Electric Research Laboratories,Technique Report 096,2003,1-11
    [27]L.Zhao,G.Pingali,I.Carlbom.Real-time head orientation estimation using neural networks.In:Proceedings of IEEE International Conference on Image Processing,2002,1:297-300
    [28]E.Seemann,K.Nickel,R.Stiefelhagen.Head Pose Estimation Using Stereo Vision For Human-Robot Interaction.In:Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition,2004,1:626-631
    [29]R.Rae,H.J.Ritter.Recognition of human head orientation based on artificial neural networks.IEEE Transactions on Neural Networks,1998,9:257-265
    [30]H.Murase,S.K.Nayar.Visual learning and recognition of 3D objects from Appearance.International Journal of Computer Vision,1995,14(1):5-24
    [31]Y.Wei,L.Fradet,T.Tan.Head pose estimation using Gabor eigenspace modeling. In: Proceedings of International Conference on Image Processing,2002,1:22-25
    [32] S. Park, J.K. Aggarwal. Head Segmentation and Head Orientation in 3D space for Pose Estimation of Multiple People. In: Proceeding of Image Analysis and Interpretation, 2000, 1:192-196
    [33] D.B. Russakoff, M. Herman. Head tracking using stereo. Machine Vision and Applications, 2002, 13:164-173
    [34] S. Malassiotis, M.G Strintzis. Real-time head tracking and 3D pose estimation from range data. In: Proceedings of IEEE International Conference on Image Processing, 2003,1:859-862
    [35] T. Cootes, G. Edwards, C. Taylor. Active appearance models. IEEE Trans.Pattern Analysis and Machine Intelligence, 2001,23(6):681-685
    [36] T. Cootes, C. Taylor, D. Cooper, and J. Graham. Active shape models-their training and application. Computer Vision and Image Understanding, 1995,61(1):38-59
    [37] J. Xiao, S. Baker, I. Matthews, T. Kanade. Real-time combined 2D+3D active appearance models. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, 2:535-542
    [38] S. Baker, I. Matthews, J. Xiao, R. Gross, T. Kanade, T. Ishikawa. Real-time non-rigid driver head tracking for driver mental state estimation. Tech. report CMU-RI-TR-04-10, Robotics Institute, Carnegie Mellon University, 2004
    [39] C. Hu, J. Xiao, I. Matthews, S. Baker, J. Cohn, T. Kanade. Fitting a single active appearance model simultaneously to multiple images. In: Proceedings of British Machine Vision Conference, 2004, 1:437-446
    [40] M. Potzsch, N. Kruger, C. von der Malsburg. Determinationof face position and pose with a learned representation based on labeled graphs. Image and Vision Computing, 1997, 15(8):665-673
    [41] J. Wu, M. Trivedi. A two-stage head pose estimation framework and evaluation. Pattern Recognition, 2008,41(3):1138-1158
    [42] H. Wilson, F. Wilkinson, L. Lin, M. Castillo. Perception of head orientation.Vision Research, 2000,40(5):459-472
    [43] H. Gee, R. Cipolla. Determining the gaze of faces in images. Image and Vision Computing, 1994, 12 (10):639-647
    [44] S.Y. Ho, H.L. Huang. An analytic solution for the pose determination of human faces from a monocular image. Pattern Recognition Letters, 1998,19(11):1045-1054
    [45] A. Nikolaidis, I. Pitas. Facial feature extraction and pose determination. Pattern Recognition, 2000, 33:1783-1791
    [46] P. Yao, G. Evans, A. Calway. Using affine correspondence to estimate 3-D facial pose. In: Proceedings of IEEE International Conference on Image Processing, 2001, 3:919-922
    [47] Q.Ji. 3D face pose estimation and tracking from a monocular camera. Image and Vision Computing, 2002, 20:499-511
    [48] J. Wang, E. Sung. Pose determination of human faces by using vanishing points. Pattern Recognition, 2001, 34:2427-2445
    [49] J.B. Huang, Z. Chen, J.Y. Lin. A study on the dual vanishing point property.Pattern Recognition, 1999, 12:2029-2039
    [50] F. Schaffalitzky, A. Zisserman. Planar grouping for automatic detection of vanishing lines and points. Image and Vision Computing, 2000, 9:647-658
    [51] P. Parodi, G. Piccioli. 3D shape reconstruction by using vanishing points. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(2):211-217
    [52] B. Brillault, O. Mahony. New method for vanishing points from detection.CVGUP: Image Understanding, 1991, 54(2):289-300
    [53] M.J. Magee, J.K. Aggarwal. Determining vanishing points from perspective images. Computer Vision Graphics Image Process, 1984,26:256-267
    [54] J. Xiao, T. Kanade, J.F. Cohn. Robust full-motion recovery of head by dynamic templates and re-registration techniques. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, 2002,1:593-600
    [55] G. Liang, H. Zha, H. Liu. Affine correspondence based head pose estimation for a sequence of images by using a 3D model. In: Proceedings of IEEE 6th International Conference on Automatic Face and Gesture Recognition (FG 2004), 2004, 1:632-637
    [56] R. Lopez, T.S.Huang. 3D Head pose computation from 2D images: template versus features. In: Proceedings of IEEE International Conference on Image Processing, 1995,2:599-602
    [57] V. Lepetit, J. Pilet, P. Fua. Point matching as a classification problem for fast and robust object pose estimation, In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004,2:244-250
    [58] I. Shimizu, Z. Zhang, S. Akamatsu, K. Deguchi. Head pose determination from one image using a generic model. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, 1998, 1:100-105
    [59] J. Yao, W. Cham. Efficient model-based linear head motion recovery from movies. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, 2:414-421
    [60] S.Y. Park, M. Subbarao. Pose estimation and integration for complete 3D model reconstruction. In: Proceedings of Sixth IEEE Workshop on Applications of Computer Vision, 2002, 1:143-147
    [61] D.A. Simon, M. Hebert, T. Kanade. Real-time 3-D pose estimation using a high-speed range sensor. In: Proceedings of IEEE International Conference on Robotics and Automation, 1994, 3:2235-2241
    [62] N. Burtnyk, M. Greenspan. Signature search method for 3-D pose refinement with range data. In: Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 1994, 1:312-319
    [63] Y.C. Lee, D. Terzopoulos, K. Waters. Realistic Face Modeling for Animation. In: Proceedings of SIGGRAPH, 1995, 1:55-62
    [64] Y.C. Lee, D. Terzopoulos, K. Waters. Constructing Physics-Based Facial Models of Individuals. In: Proceedings of Graphics Interface, 1993,1:1-8
    [65] T.A. kimoto, Y. Suenaga, R.S. Wallace. Automatic Creation of 3D Facial Models.IEEE Computer&Application,1993,1:16-22
    [66]C.Xu,L.Quan,Y.Wang,T.Tan,M.Lhuilier.Adaptive Multiresolution Fitting and its Application to Realistic Head Modeling.In:Proceedings of IEEE Geometric Modeling and Processing,2004,1:345-348
    [67]F.Pighin,R.Szeliski,D.Salesin.Modeling and animating realistic faces from images.International Journal of Computer Vision,2002,50(2):143-169
    [68]Z.Liu,Z.Zhang,C.Jacobs,M.Cohen.Rapid modeling of animated faces from video.Technical Report,MSR-TR-2000-11,2000.
    [69]P.Fua,C.Miccio.From Regular Images to Animated Heads:a Least Squares Approach.In:Proceedings of ECCV'98,1998,1:188-202
    [70]P.Fua.Using Model-driven Bundle-adjustmentto Model Heads from Raw Video Sequences.In:Proceedings of IEEE International conference on computer vision,1999,1:46-53
    [71]A.R.Chowdhury,R.Chellappa.Face Reconstruction from Monocular Video Using Uncertainty Analysis and a Generic Model.Computer Vision and Image Understanding,2003,1:188-213
    [72]L.Xin,Q.Wang,J.Tao.Automatic 3D Face Modeling from Video.In:Proceedings of IEEE International conference on computer vision(ICCV2005),2005,2:1193-1199
    [73]DA Forsyth,J.Ponce.Computer Vision:A Modem Approach,Prentice Hall,1 st edition.
    [74]R.Hartley,A.Zisserman.Multiple View Geometry in Computer Vision.Cambridge University Press,2ed edition.
    [75]S.D.Ma,Z.Y.Zhang.Computer Vision:Computation Theory and Algorithm Basis.Beijing:Science Press,1998(in Chinese)
    (马颂德,张正友.计算机视觉——理论和算法.科学出版社,1998)
    [76]Point Grey Research,Inc.http://www.ptgrey.com/products/digiclops/index.asp
    [77]M.A.Fischler,R.C.Bolles.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM,1981,24(6):381-395
    [78]T.Lindeberg.Scale-space theory:a basic tool for analysing structures at different scales.Journal of Applied Statistics,1994,21:225-270
    [79]K.Mikolajczyk,C.Schmid.A Performance Evaluation of Local Descriptors.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(10):1615-1630
    [80]Berthold K.P.Horn.Closed-form solution of absolute orientation using unit quaternions.Journal of the Optical Society of America,1987,44(4):629-642
    [81]S.Roweis.Levenberg-marquardt optimization.http://www.cs.toronto.edu/～roweis/notes/lm.pdf
    [82]M.Harville,A.Rahimi,T.Darrell,G.Gordon,J.Woodfill.3D pose tracking with linear depth and brightmess constraints.In:Proceedings of IEEE 7th International Conference on Computer Vision,1999,1:206-213
    [83]J.Song.3D Model Based Head Pose Tracking from a Monocular Image Sequence.Master Thesis of Zhejiang University,2008
    (宋杰.基于三维模型的单目图象序列头部姿态跟踪.浙江大学硕士学位论文,2008)
    [84]L.P.Morency,A.Rahimi,T.Darrell.Fast 3D Model Acquisition from Stereo Images.In:Proceedings of 3D PVT,2002,1:172-176.
    [85]Y.Ke,R.Sukthankar.PCA-SIFT:A More Distinctive Representation for Local Image Descriptors.In:Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2004,1:506-513
    [86]S.Winder,G.Hua,M.Brown.Picking the Best Daisy.In:Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2009,1:178-185
    [87]H.Bay,T.Tuytelaars,L.Van Gool.SURF:Speeded Up Robust Features.In:Proceedings of 9th European Conference on Computer Vision,2006,13:404-417
    [88]H.Bay,A.Ess,T.Tuytelaars,L.Van Gool.SURF:Speeded Up Robust Features.Computer Vision and Image Understanding(CVIU),2008,110(3):346-359
    [89]Y.T.Tsai,Q.Wang,S.Y.You.CDIKP:A Highly-Compact Local Feature Descriptor.In:Proceedings of IEEE International Conference on Pattern Recognition,2008,1:1-4
    [90]Y.Hel-Or,H.Hel-Or.Real-Time Pattern Matching Using Projection Kernels.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(9):1430-1445
    [91]G.Ben-Artzi,H.Hel-Or,Y.Hel-Or.The Gray Code Filter Kernels.IEEE Trans on Pattern Analysis and Machine Intelligence,2007,29(3):382-393
    [92]Website of the performance evaluation of region detectors/descriptors.http://www.robots.ox.ac.uk/～vgg/research/affine/
    [93]A.Rahimi,L.P.Morency,T.Darrell.Reducing drift in differential tracking.Computer Vision and Image Understanding,2006,109(2):97-111
    [94]Y.Zhu,K.Fujimura.Head pose estimation for driver monitoring.In:proceeding of IEEE Intelligent Vehicle Symposium,2004,1:501-506
    [95]F.Lu,E.Milios.Globally consistent range scan alignment for environment mapping.Journal of Autonomous Robots,1997,4:333-349
    [96]T.Kailath,A.Sayed,B.Hassibi.Linear estimation.Englewood Cliffs,Prentice-Hall,1999
    [97]R.I.Hartley.In Defense of the Eight-Point Algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(6):580-593
    [98]N.Ayache,C.Hansen.Rectification of images for binocular and trinocular stereo vision.In:Proceedings of IEEE International Conference on Pattern Recognition,1988,1:11-16
    [99]Marc Pollefeys."Polar rectification".http://www.cs.unc.edu/～marc/tutorial/node99.html
    [100]H.Baker,T.Binford.Depth from Edge and Intensity Based Stereo.In:proceedings of 7th International Joint Conference on Artificial Intelligence (IJCAI),1981,1:631-636
    [101]O.Faugeras,R.Keriven.Complete Dense Stereovision using Level Set Methods. In: Proceedings of European Conference on Computer Vision, 1998,1:379-393
    [102] Y. Boykov, O. Veksler, R. Zabih. Fast Approximate Energy Minimization via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(11):1222-1239
    [103] S. Birchfield, C. Tomasi. A Pixel Dissimilarity Measure That is Insensitive to Image Sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(4):401-406
    [104] H. Hirschmuller, D. Scharstein. Evaluation of Cost Functions for Stereo Matching. In: Proceedings of IEEE International Conference on Computer vision and Pattern Recognition, 2007, 1:1-8
    [105] C. Strecha, R. Fransens, L. V. Gool. Combined Depth and Outlier Estimation in Multi-View Stereo. In: Proceedings of IEEE International Conference on Computer vision and Pattern Recognition, 2006, 2:2394-2401
    [106] S. Roy, I. Cox. A Maximum-Flow Formulation of the N-camera Stereo Correspondence Problem. In: Proceedings of IEEE International Conference on Computer Vision, 1998,1:492-499
    [107] T. Tuytelaars, L. Van Gool. Wide Baseline Stereo Matching based on Local,Affinely Invariant Regions. In BMVC, 2000, 1:412-422
    [108] C. Strecha, T. Tuytelaars, L. Van Gool. Dense Matching of Multiple Wide-Baseline Views. In: Proceedings of IEEE International Conference on Computer Vision, 2003,2:1194-1201
    [109] J. Yao, W.-K. Cham. 3-D Modeling and Rendering from Multiple Wide-Baseline Images. SPIC, 2006, 21:506-518
    [110] E. Tola, V. Lepetit, P. Fua: A Fast Local Descriptor for Dense Matching. In: Proceedings of IEEE International Conference on Computer vision and Pattern Recognition, 2008, 1:1-8
    [111] P. Viola, M.J. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. In: Proceedings of IEEE International Conference on Computer vision and Pattern Recognition, 2001, 1:511-518
    [112] C. Strecha, W. Von Hansen, L. Van Gool, et al. On Benchmarking Camera Calibration and Multi-View Stereo for High Resolution Imagery. In:Proceedings of IEEE International Conference on Computer vision and Pattern Recognition, 2008, 1-8

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700