基于视频的人体三维运动恢复相关技术研究

英文题名：Video Based 3D Human Motion Recovery
作者：陈成
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：三维运动恢复 ; 运动捕获 ; 三维人体动画 ; 人体运动分析 ; 图像理解 ; 运动检测 ; 背景建模 ; 特征选择 ; 三维运动数据处理 ; 人机交互
英文关键词：3D motion recovery ; motion capture ; 3D human animation ; human motion analysis ; image understanding ; motion detetion ; background modeling ; feature selection ; 3D motion data processing ; human-computer interaction
学位年度：2009
导师：庄越挺
学科代码：081202
学位授予单位：浙江大学
论文提交日期：2009-10-01

摘要

基于视频的人体运动分析是计算机视觉和图形学领域的一个重要的研究方向,从不使用反光标记点的图像和视频中恢复人体三维运动的技术则是其中一个活跃而重要的研究课题,其对于三维计算机动画、运动捕获、自然人机交互、智能视频监控等很多方面都有非常直接的应用。本文围绕基于视频的人体运动三维恢复这一课题进行了如下的一系列研究:
     提出一种基于非参数化运动估计和图像配准的方法来进行相机运动条件下的前景提取,所提取的前景人体轮廓用于后续的人体运动三维恢复。对训练背景图像进行非参数化运动估计,并利用流形学习对训练背景图像进行建模。之后,通过在背景流形上进行运动插值来快速的估计新视频帧和训练背景图像之间的运动,再动态构造出一幅和视频帧的视角完全相同的背景图像.最后通过背景减除提取人物(前景)轮廓。该方法可以有效的在相机平移、旋转、抖动、缩放等运动和复合运动下的视频中提取人物(前景)轮廓。
     提出一种新的自适应轮廓特征提取方法,对视频中提取出的人物轮廓进行特征表达。作为一种二维形状,有多种形状特征可以用于轮廓表达。为此,首先考察了多种广泛使用的轮廓特征在三维姿态重建中的效果,得到若干重要的比较性结论。之后,提出自适应轮廓特征提取方法,通过不断的特征渐近组合和选取,从传统的特征中生成最优的特征组合。和原有的轮廓特征相比,新特征具有自适应的能力,即:能够根据当前的问题所涉及的具体数据合理地选择最优的特征:同时,新特征具有很低的维度,有利于减小后续步骤的计算复杂度。
     提出了一种生成式(generative)人体三维运动恢复方法。首先对人体轮廓进行分析,获取躯干和末端节点位置信息;然后采用最优化方法搜索最佳三维姿态。根据人体骨架特点,提出一个有效且计算简单的目标函数以及一种迭代优化策略,极大地减少了计算量;设计了一个新颖的姿态序列恢复流程,克服了误差累积等传统跟踪方法的缺点。
     提出一种新的三维姿态距离度量函数——关系几何距离。三维人体姿态的距离度量,可以直接用来定量评价一个三维运动恢复算法的准确度。为了使三维姿态距离度量更好的符合人们的感知,首先,定义了人体三维姿态上的一个关系几何特征库,这些特征表征了身体各个部位之间的相互关系。之后,利用自举(Adaboost)算法,从关系特征库中选取最相关的特征。最后,三维姿态之间的距离,就表示为特征之间的加权距离。和其他传统姿态距离相比,该距离能够更好的模拟人对于姿态相似度的认知。
     提出一个区分式(discriminative)人体三维运动恢复方法,并实现了一个实际的系统。首先,通过已知的姿态.轮廓样例构造先验数据库。之后,对视频中提取出的轮廓,首先采用七近邻方法对每帧找出候选姿态集合。最后,利用动态规划算法,在各帧的候选姿态集合中寻找最优的姿态路径,恢复出连续的姿态序列。该方法可以在全自动的条件下实时的恢复视频中的三维人体运动。
Video based human motion analysis is an important research field in computer vision and graphics communities, of which the technique of recovering 3D human motion from markerless images or videos is a active subject that has immediate applications in 3D computer animation, motion capture, natural human-computer interaction, intelligent video survaillence and so on. This paper studies some related techniques as following.
     A novel approach is proposed that extends the classical background subtraction method to extract silhouettes from videos in real time with dynamic viewpoint variation caused by camera movement. First, manifold learning is used to model the background under viewpoint variations. Then, for each new frame, the background image corresponding to the same viewpoint is synthesized on the fly by examining the local neighborhood on the manifold, and the silhouette is extracted via background subtraction. Experiments show that our approach can efficiently extract accurate silhouettes in complex situations.
     We propose a new adaptive and compact silhouette feature that can be used to express the silhouettes extracted in the previous step. We first examine a series of popular shape features in the context of 3D pose recovery, getting some valuable insights on the choices of features. Then, an adaptive and compact silhouette feature is proposed by progressive feature combination and selection from traditional shape features. Compared to traditional features, the new feature is more effective for 3D pose recovery and the dimension is reduced.
     We propose a new generative 3D pose recovery method. First, extracted silhouettes are analyzed to derive 2D positions of spine and end sites. Then, 3D poses are recovered by optimizing an object function that encodes the correspondence between the analyzed silhouettes and a pose-parameterized 3D human skeleton. In order to reduce the computational complexity, an effective and computationally efficient object function is devised. A novel iterative optimization process that exploits the human skeleton structure is also proposed to boost the optimization. Experiments show that complex motions of a large variety of types can be recovered by the proposed method.
     We propose a new perceptual pose distance: Relational Geometric Distance. Distance metric of 3D poses can be directly used to estimate the performance of a 3D motion recovery system. First, an extensive relational geometric feature pool that contains a large number of potential features is defined, and the features effective for pose similarity estimation are selected by Adaboost. Finally, the selected features form a pose distance function that can be used for novel poses. Experiments show that our method outperforms others in emulating human perception in pose similarity.
     We propose a new example based 3D motion recovey method, and develops a real system implementation. First, a lookup database is constructed from silhouettes and corresponding 3D poses. Then, for silhouettes extracted from each frame in the video, the database is searched in a k nearest neighbor way and a list of pose candidates is returned. Finally, dynamic programming is employed to find the optimal pose path in the candidate lists. The proposed method recovers 3D motions automatically in real time.

引文

[1] D. Sturman. A brief history of motion capture for computer character animation.Character motion system. SIGGRAPH 2004, Course 9.

    [2] T. W. Calvert, J. Chapman, A. Patla. Aspects of the kinematic simulation of human movement. IEEE Computer Graphics and Applications, 1982, 2(9):41-50.

    [3] B. Robertson. Moving pictures. Computer Graphics World, 1992,15(10):38-44.

    [4] C. M. Ginsberg, D. Maxwell. Graphical marionette. ACM SIGGRAPH/S1GART Workshop on Motion, 1983: 172-179, ACM Press, New York.


    [5] J. Kleiser. Character motion systems. ACM SIGGRAPH 1993 Course Notes:Character Motion Systems: 33-36.

    [6] H. Tardif. Character animation in real time. ACM SIGGRAPH Panel Proceedings,1991, Panel: Applications of Virtual Reality 1: Reports from the Field.

    [7] B. Robertson. Mike, the talking head. Computer Graphics World, July 1988:15-17.

    [8] http://www.metamotion.com
    [9] http://www.measurand.com
    [10] http://www.ascension-tech.com

    [11] A. Ward, A. Jones, A. Hopper. A new location technique for the active office.Personal Communications. 1997,4(5): 42-47.

    [12] N. Priyantha, A. Chakraborty, H. Balakrishnan. The cricket location-support system. International Conference on Mobile Computing and Networking, 2000:32-43.

    [13] E. Foxlin, M. Harrington. Weartrack. A selfreferenced head and hand tracker for wearable computers and portable VR. In International Symposium on Wearable Computers, 2000: 155-162.

    [14] C. Randell, H. H. Muller. Low cost indoor positioning system. International Conference on Ubiquitous Computing. 2001:42-48.
    [15] M. Hazas, A. Ward. A novel broadband ultrasonic location system. International Conference on Ubiquitous Computing, 2002: 264-280.

    [16] N. M. Vallidis. WHISPER: a spread spectrum approach to occlusion in acoustic tracking. PhD thesis, 2002, University of North Carolina at Chapel Hill.

    [17] E. Olson, J. Leonard, S. Teller. Robust rangeonly beacon localization. Journal of Oceanic Engineering 2006,31(4): 949-958.

    [18] N. Miller, O. C. Jenkins, M. Kallmann, M. J. Matric. Motion capture from inertial sensing for untethered humanoid teleoperation. International Conference of Humanoid Robotics, 2004: 547-565.

    [19] E. Foxlin. Pedestrian tracking with shoe-mounted inertial sensors. Computer Graphics and Applications, 2005,25(6): 38-46.

    [20] H. J. Woltring. New possibilities for human motion studies by real-time light spot position measurement. Biotelemetry, 1974,1(3): 132-146.

    [21] T. G. Bishop. Self-Tracker. A Smart Optical Sensor on Silicon. PhD thesis, 1984,University of North Carolina at Chapel Hill.

    [22] http://www.motionanalysis.com

    [23] http://www.vicon.com

    [24] http://www.disontech.com.cn/products/DMC.htm

    [25] http://www.dorealsoft.com/product/?category=17

    [26] E. Foxlin, M. Harrington, G. Pfeifer. Constellation. A wide-range wireless motion-tracking system for augmented reality and virtual set applications. In Proceedings of SIGGRAPH 1998: 371-378.

    [27] D. Vlasic, R. Adelsberger, G. Vannucci, J. Barnwell, M. H. Gross, W. Matusik, J.Popovic. Practical motion capture in everyday surroundings. ACM Trans. Graph.2007,26(3): 35.

    [28] E. R. Bachmann. Inertial and magnetic tracking of limb segment orientation for inserting humans into synthetic environments. PhD thesis, 2000, Naval Postgraduate School, Monterey, California.

    [29] H. Brashear, T. Starner, P. Lukowicz, H. Junker. Using multiple sensors for mobile sign language recognition. In International Symposium on Wearable Computers, 2003:45-53.
    [30]T.B.Moeslund,E.Granum.A survey of computer vision-based human motion capture,Computer Vision and Image Understanding,2001,81(3):231-268.
    [31]T.B.Moeslund,A.Hilton,V.Kruger.A survey of advances in vision-based human motion capture and analysis,Computer Vision and Image Understanding 2006,104(2-3):90-126.
    [32]R.Poppe.Vision-based human motion analysis:An overview.Computer Vision and Image Understanding,2007,108(1-2):4-18.
    [33]M.Cheung,T.Kanade,J.Y.Bouguet,M.Holler.A real time system for robust 3D voxel reconstruction of human motions.CVPR,2000,2:714-720.
    [34]K.Grauman,G.Shakhnarovich,T.Darrell.Inferring 3d structure with a statistical image-based shape model.ICCV,2003:641-648.
    [35]Y.Sun,M.Bray,A.Thayananthan,B.Yuan,P.Torr.Regression-based human motion capture from voxel data.BMVC,2006:641-648.
    [36]L.Gond,P.Sayd,T.Chateau,M.Dhome.A 3d shape descriptor for human pose recovery.AMDO,2008:370-379.
    [37]M.Potmesil.Generating octree models of 3D objects from their silhouettes in a sequence of images.Computer Vision,Graphics and Image Processing,1987,40:1-20.
    [38]R.Szeliski.Rapid octree construction from image sequences.CVGI:Image Understanding,1993,58(1):23-32.
    [39]G.Mori,S.J.Beiongie,J.Malik.Efficient shape matching using shape contexts.IEEE Trans.Pattern Anal.Math.Intell.,2005,27(11):1832-1837.
    [40]J.Deutscher,A.Blake,1.D.Reid.Articulated body motion capture by annealed particle filtering.CVPR,2000:2126-2133.
    [41]Cristian Sminchisescu,Bill Triggs.Covariance Scaled Sampling for Monocular 3D Body Tracking.CVPR,2001.
    [42]罗忠祥,庄越挺,潘云鹤.基于双摄像机的视频特征跟踪方法研究,《计算机辅助设计与图形学学报》,2002,14(7):646-650.
    [43]Y.Wu,G.Hua,T.Yu.Tracking articulated body by dynamic markov network.ICCV,2003:1094-1101.
    [44]Jianhui Zhao,Ling Li.Human motion reconstruction from monocular images using genetic algorithms. Computer Animation and Virtual Words, 2004, 15(2-3):407-414.

    [45] L. Sigal, S. Bhatia, S. Roth, M. J. Black, M. Isard. Tracking loose-limbed people.CVPR(1),2004:421-428.

    [46] Chen Yisheng, Lee Jinho, Parent Rick, Machiraju Raghu. Markerless Monocular Motion Capture Using Image Features and Physical Constraints. Computer Graphics International, 2005: 36-43.

    [47] G. Mori, J. Malik. Recovering 3d human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28(7): 1052-1062.

    [48] Cheng Chen, Yueting Zhuang, Jun Xiao, Fei Wu. Towards robust 3D reconstruction of human motion in monocular video. IEEE International Conference on Artificial Intelligence and Tele-existence (ICAT), 2006.

    [49] Cheng Chen, Yueting Zhuang, Yin Cheng, Shicong Zhao. Video motion capture by silhouette analysis and pose optimization. IEEE International Conference on Multimedia and Expo (1CME), 2007.

    [50] A. Fathi, G. Mori. Human pose estimation using motion exemplars. ICCV. 2007:1-8.

    [51] X. Zhao, Y. Liu. Generative tracking of 3d human motion by hierarchical annealed genetic algorithm. Pattern Recognition, 2008,41 (8): 2470-2483.

    [52] M. W. Lee, R. Nevada. Human pose tracking in monocular sequence using multilevel structured models. IEEE Trans. Pattern Anal. Mach. Intell.. 2009, 31(1):27-38.

    [53] M. Isard, A. Blake. Visual tracking by stochastic propagation of conditional density. 4th European Conf. Computer Vision, 1996: 343-356.

    [54] G. Winter, J. Periaux, M. Galan, P. Cuesta. Genetic Algorithms in Engineering and Computer Science, 1995, John Wiley & Sons.

    [55] Davidor Y. Genetic Algorithms and Robotics: A Heuristic Strategy for Optimization. World Scientific Publishing, 1991.

    [56] M. Isard. Pampas: Real-valued graphical models for computer vision. CVPR (1).2003:613-620.
    [57] R. Rosales, S. Sclaroff. Inferring body pose without tracking body parts. CVPR,2000.

    [58] G. Shakhnarovich, P. A. Viola, T. Darrell. Fast pose estimation with parameter-sensitive hashing. ICCV, 2003: 750-759.

    [59] A.M. Elgammal, C.-S. Lee. Inferring 3d body pose from silhouettes using activity manifold learning. CVPR, 2004: 681-688.

    [60] L. Ren, G. Shakhnarovich, J. K. Hodgins, H. Pfister, P. A. Viola. Learning silhouette features for control of human motion. ACM Trans. Graph., 2005, 24(4): 1303-1331.

    [61] R. Urtasun, D. Fleet, P. Fua. Monocular 3D Tracking of the Golf Swing. CVPR 2005.

    [62] Antonio S. Micilotta, Eng Jon Ong, Richard Bowden. Real-time Upper Body 3D Pose Estimation from a Single Uncalibrated Camera. Eurographics 2005

    [63] A. Agarwal, B. Triggs. Recovering 3d human pose from monocular images.IEEE Trans. Pattern Anal. Mach. Intell., 2006,28 (1): 44-58.

    [64] E.-J. Ong, A. S. Micilotta, R. Bowden, A. Hilton. Viewpoint invariant exemplar-based 3d human tracking. Computer Vision and Image Understanding 2006, 104 (2-3): 178-189.

    [65] F. Lv, R. Nevatia. Single view human action recognition using key pose matching and viterbi path searching. CVPR, 2007.

    [66] R. Poppe. Evaluating example-based pose estimation: Experiments on the humaneva sets. in: Online Proceedings of the Workshop on Evaluation of Articulated Human Motion and Pose Estimation (EHuM) at the International Conference on Computer Vision and Pattern Recognition (CVPR), Minnesota,Minneapolis, 2007: 1-8.

    [67] N. R. Howe. Silhouette lookup for monocular 3d pose tracking. Image Vision Comput.2007,25(3): 331-341.

    [68] H. Ning, W. Xu, Y. Gong, T. S. Huang. Discriminative learning of visual words for 3d human pose estimation. CVPR, 2008.

    [69] Cheng Chen, Yueting Zhuang, Jun Xiao. Example based 3D Motion Recovery by Searching with Compact and Discriminative Silhouette Feature. In: ACCV 2009.
    [70] P. A. Viola, M. J. Jones. Rapid object detection using a boosted cascade of simple features. in: CVPR, 2001: 511 -518.

    [71] S. Roweis, L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000,290(5500): 2323-2326.

    [72] T. Poggio, F. Girosi. Network for approximation and learning. Proceedings of the IEEE, 1990, 78(9): 1481-1497.

    [73] M. Tipping. The Relevance Vector Machine. Neural Information Processing Systems, 2000.

    [74] M. Tipping. Sparse bayesian learning and the Relevance Vector Machine. J.Machine Learning Research, 2001,1:211-244.

    [75] C. Bishop, M. Svensen. Bayesian mixtures of experts. Uncertainty in Artificial Intelligence, 2003.

    [76] M. Jordan, R. Jacobs. Hierarchical mixtures of experts and the em algorithm.Neural Computation, 6,1994.

    [77] R. Rosales, S. Sclaroff. Combining generative and discriminative models in a framework for articulated pose estimation. International Journal of Computer Vision, 2006, 67(3): 251-276.

    [78] C. Sminchisescu, A. Kanaujia, D. Metaxas. Learning joint top-down and bottom-up processes for 3D visual inference. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2006,2: 1743-1752, New York, NY.

    [79] A. Lipton, H. Fujiyoshi, R. Patil. Moving target classification and tracking from real-time video. In: Proc IEEE Workshop on Applications of Computer Vision,Princeton, 1998:8-14.

    [80] R. Archetti, C. E. Manfredotti, V. Messina, D. G. Sorrenti. Foreground-to-ghost discrimination in single-difference pre-processing. ACIVS, 2006(4179):263-274.

    [81] S. Baker, I. Matthews. Lucas-Kanade 20 years on: a unifying framework.International Journal of Computer Vision, 2004, 56(3):221-255.

    [82] H. Sidenbladh. Detecting human motion with support vector machines.International Conference on Pattern Recognition (IEEE ICPR), 2004. 2: 188-191.

    [83] Verri A, Poggio T. Against quantitative optical flow. In: Proceedings of IEEE International Conference on Computer Vision, 1987,171-180.

    [84] M. K. Leung Y. H. Yang. Human body motion segmentation in a complex scene.Pattern Recognition, 2987,20(1):55-64.

    [85] T. Horprasert, D. Harwood, L. Davis. A statistical approach for real-time robust background subtraction and shadow detection. In IEEE ICCV Frame-rate Workshop, 1999.

    [86] W. E. L. Grimson, C. Stauffer, R. Romano. Using adaptive tracking to classify and monitor activities in a site. In CVPR, 1998:22-29.

    [87] M.A.T. Figueiredo A.K. Jain. Unsupervised learning of finite mixture models.PAMI, 2002,24(3):381-396.

    [88] C. Stauffer, W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In CVPR, 1999,2: 246-252.

    [89] C. R. Wren, A. Azarbayejani, T. Darrell, A. P. Pentland. Pfinder: Real-time tracking of human body. PAMI, 1997,19(7):780-785.

    [90] D. Koller, J. Weber, T. Huang, J. Malik, G. Ogasawara, B. Rao, S. Russell.Toward robust automatic traffic scene analysis in real-time. In 1CPR, 1994:126-131.

    [91] B.P.L. Lo, S.A. Velastin. Automatic congestion detection system for underground platforms.ISIMP,2001:158-161.

    [92] R. Cucchiara, C. Grana, M. Piccardi, A. Prati. Detecting moving objects, ghosts,and shadows in video streams. IEEE Transactions on PAMI, 2003,25(10):1337-1442.

    [93] D. Comaniciu, P. Meer. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on PAMI, 2002,24(5):603-619.

    [94] D. Comaniciu. An algorithm for data-driven bandwidth selection. IEEE Transactions on PAMI, 2003,25(2):281-288.

    [95] M. Piccardi, T. Jan. Mean-shift background image modeling. IEEE ICIP, 2004, 5:3399- 3402.

    [96] B. Han, D. Comaniciu, L.S. Davis. Sequential kernel density approximation through mode propagation: applications to background modeling. Proceedings of ACCV, 2004.

    [97] J. Kato, T. Watanabe, S. Joga, J. Rittscher, A. Blake. An HMM based segmentation method for traffic monitoring movies. PAMI, 2002, 24(9):1291-1296.

    [98] B. Stenger, V. Ramesh, N. Paragios, F. Coetzee, J.M. Buhmann. Topology free hidden markov models: application to background modeling. In 1CCV, 2001:294-301.

    [99] N.M. Oliver, B. Rosario, A.P. Pentland. A Bayesian computer vision system for modeling human interactions. IEEE Transactions on PAMI, 2000 22(8):831-843.

    [100] D. Sko(?)aj, A. Leonardis. Weighted and robust incremental method for subspace learning. ICCV, 2003,2: 1494- 1501.

    [101] Y. Li. On incremental and robust subspace learning. Pattern Recognition 2004(37): 1509-1518.

    [102] A. Mittaland, D. Huttenlocher. Scene modeling for wide area surveillance and image synthesis. In CVPR, 2000, 2:160-167.

    [103] L. Wixson. Detecting salient motion by accumulating directionally-consistent flow. PAMI, 2000,22(8):774-780.

    [104] J. Heuer, A. Kaup. Global motion estimation algorithm for video segmentation.In ACM Multimedia, 1999: 261 -264.

    [105] A. Smolic, M. Hoeynck, J.-R. Ohm. Low-complexity global motion estimation from P-frame motion vectors for MPEG-7 applications. In ICIP, 2000, 2:271-274.

    [106] S. Loncaric. A survey of shape analysis techniques. Pattern Recognition 31 (8)(1998)983-1001.

    [107] D. Zhang, G. Lu. Review of shape representation and description techniques.Pattern Recognition, 2004, 37 (1): 1-19.

    [108] S. Park, M. M. Trivedi. Multi-person interaction and activity analysis: a synergistic track- and body-level analysis framework. Mach. Vis. AppL 2007, 18(3-4): 151-166.
    [109] F. Fleuret, J. Berclaz, R. Lengagne, P. Fua. Multicamera people tracking with a probabilistic occupancy map. IEEE Trans. Pattern Anal. Mach. Intell., 2008, 30(2): 267-282.

    [110] E. M. Arkin, L. P. Chew, D. P. Huttenlocher, K. Kedem, J. S. B. Mitchell. An efficiently computable metric for comparing polygonal shapes. IEEE Trans.Pattern Anal. Mach. Intell., 1991,13 (3): 209-216.

    [111] W. Niblack, J. Yin. A pseudo-distance measure for 2d shapes based on turning angle, in: ICIP '95: Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3, IEEE Computer Society, Washington, DC, USA,1995.

    [112] C. T. Zahn, R. Z. Roskies. Fourier descriptors for plane closed curves. Transactions on Computers, IEEE Transactons on Computers, 1972, 21 (3):269-281.

    [113] R. D. de Leon, L. E. Sucar. Human silhouette recognition with fourier descriptors. in: ICPR, 2000: 3713-3716.

    [114] R.W. Poppe, D.M. Poel. Example-based pose estimation in monocular images using compact fourier descriptors. CTIT Technical Report Series, number of pages: 47 (2005).

    [115] M. K. Hu. Visual pattern recognition bymoment invariants. IRE Transactions on Information Theory, IT-8,1962: 179-187.

    [116] C. C. Chen. Improved moment invariants for shape discrimination. Pattern Recognition 1993, 26 (5): 683-686.

    [117] J. M. Joo. Boundary geometric moments and its application to automatic quality control in the industry. Industrial Data, 2006,9(1): 76-84.

    [118] L. Gorelick,M. Galun, E. Sharon, R. Basri, A. Brandt. Shape representation and classification using the poisson equation. IEEE Trans. Pattern Anal. Mach. Intell.,2006,28(12): 1991-2005.

    [119] S. Belongie, J. Malik, J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell., 2002,24 (4): 509-522.

    [120] D. P. Huttenlocher, G. A. Klanderman, W. Rucklidge. Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell., 1993, 15 (9):850-863.
    [121] M. P. Dubuisson, A. K. Jain. A modified hausdorff distance for object matching. ICPR,1994: 566-568.

    [122] A. D. Bimbo, P. Pala. Visual image retrieval by elastic matching of user sketches. IEEE Trans. Pattern Anal. Mach. Intell., 1997,19 (2): 121-132.

    [123] Y. Gdalyahu, D. Weinshall. Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes, IEEE Trans.Pattern Anal. Mach. Intell., 1999,21 (12): 1312-1328.

    [124] D. Geiger, A. Gupta, L. A. Costa, J. Vlontzos. Dynamic programming for detecting, tracking, and matching deformable contours. IEEE Trans. Pattern Anal.Mach. Intell., 1995, 17 (3): 294-302.

    [125] E. G. M. Petrakis, A. Diplaros, E. E. Milios. Matching and retrieval of distorted and occluded shapes using dynamic programming. IEEE Trans. Pattern Anal.Mach. Intell. 2002,24 (11): 1501-1516.

    [126] N. Dalai, B. Triggs. Histograms of oriented gradients for human detection. in:CVPR, 2005: 886-893.

    [127] A. Agarwal, B. Triggs. A local basis representation for estimating human pose from cluttered images. in: ACCV (1), 2006: 50-59.

    [128] D. G. Lowe. Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision 2004,60 (2): 91-110.

    [129] P. Scovanner, S. Ali, M. Shah. A 3-dimensional sift descriptor and its application to action recognition. in: ACM Multimedia, 2007: 357-360.

    [130] S. Nayak, S. Sarkar, B. L. Loeding. Distribution-based dimensionality reduction applied to articulated motion recognition. IEEE Trans. Pattern Anal.Mach. Intell., 2009, 31 (5): 795-810.

    [131] I. Laptev. On space-time interest points. International Journal of Computer Vision 2005, 64 (2-3): 107-123.

    [132] Y. Ke, R. Sukthankar, M. Hebert. Efficient Visual Event Detection using Volumetric Features. CVPR 2005.

    [133] P. Dollar, V. Rabaud, G. Cottrell, S. Belongie. Behavior Recognition via Sparse Spatio-Temporal Features. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005: 65-72.

    [134] A. Oikonomopoulos, 1. Patras, M. Pantic. Spatiotemporal Salient Points for Visual Recognition of Human Actions. IEEE Transactions on Systems, Man, and Cybernetics (Part B: Cybernetics), 2005, 36(3): 710-719.

    [135] S.-F. Wong, R. Cipolla. Extracting spatiotemporal interest points using global information. in: 1CCV, 2007: 1-8.

    [136] G. Willems, T. Tuytelaars, L. J. V. Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. in: ECCV, 2008: 650-663.

    [137] J. C. Niebles, H. Wang, F.-F. L.. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 2008,79(3): 299-318.

    [138] C. Schuldt, I. Laptev, B. Caputo. Recognizing Human Actions - A Local SVM Approach. ICPR 2004.

    [139] M. Bregonzio, S. Gong, T. Xiang. Recognising action as clouds of space-time interest points. in: CVPR, 2009.

    [140] D. Nister, H. Stewenius. Scalable recognition with a vocabulary tree. in: CVPR 2006:2161-2168.

    [141] T. Serre, L.Wolf, T. Poggio. Object recognition with features inspired by visual cortex. in: CVPR 2005: 994-1000.

    [142] A. Agarwal, B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. in: ECCV 2006: 30-43.

    [143] S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. in: CVPR, 2006:2169-2178.

    [144] A. Kanaujia, C. Sminchisescu, D. N. Metaxas. Semi-supervised hierarchical models for 3d human pose reconstruction. in: CVPR, 2007.

    [145] Song K, Huang J. Fast optical flow estimation and its application to real-time obstacle avoidance. In: Proceedings of IEEE international conference on Robotics and Automation, 2001,3: 2891-2896.

    [146] Horn B, Schunk P. Determining optical flow. Artificial Intelligence, 1981,17:185-203.

    [147] Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction. Science. 2000, 290(5500): 2319-2323.

    [148] Bengio Y, Paiement J, Vincent P. Out-of-sample extensions for LLE, Isomap,MDS, Eigenmaps, and spectral clustering. Advances in Neural Information Processing Systems 16, MIT Press, 2004

    [149] B. Scassellati, S. Alexopoulos, M. Flickner. Retrieving images by 2D shape:comparison of computation methods with human perceptual judgements. in:SPIE Conference on Storage and Retrieval of Image and Video Databases II,1994,2185: 1-14.

    [150]: H. Kauppinen, T. Seppanen, M Pietikainen. An experimental comparison of autoregressive and Fourier-Based descriptors in 2D shape classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(2):201-207.

    [151] D. Zhang, G. Lu. A comparative study of curvature scale space and Fourier descriptors for shape-based image retrieval. Journal of Visual Communication and Image Representation 2003,14 (1): 39-57.

    [152] C. Sanderson, D. Gibbins. On classifying silhouettes in adverse conditions. in:Intelligent Sensors, Sensor Networks and Information Processing Conference,2004:173-178.

    [153] C. Shahabi, M. Safar. An experimental study of alternative shape-based image retrieval techniques. Multimedia Tools and Applications, 2007,32 (1): 29-48.

    [154] R. Poppe, M. Poel. Comparison of silhouette shape descriptors for example-based human pose recovery. in: International Conference on Automatic Face and Gesture Recognition, 2006:541-546.

    [155] X. Yin, C. Liu, Z. Han. Feature combination using boosting. Pattern Recognition Letters, 2005,26(14): 2195-2205.

    [156] O. R. Terrades, S. Tabbone, E. Valveny. Combination of shape descriptors using an adaptation of boosting. In ICPR, 2006.

    [157] M. A. Ranzato, F. J. Huang, Y-L. Boureau, Y. LeCun. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. In CVPR, 2007.

    [158] Sigal L, Black M J. HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Providence: Brown University, 2006.
    [159]Y.Freund,R.E.Schapire.A decision-theoretic generalization of on-line learning and an application to boosting.Journal of Computer and System Sciences,1997,55(1):119-139.
    [160]R.E.Schapire.The strength of weak learnability.Machine Learning,1990,5(2),197-227.
    [161]http://mocap.cs.cmu.edu/
    [162]N.H.Getz.A Fast Discrete Periodic Wavelet Transform.Memorandum UCB/ERL M92-138,Electronic Research Laboratory Berkeley,California,22December 1992.Chapter 5.
    [163]Vignola J,Lalonde J,Bergevin,R.Progressive human skeleton fitting.Proceedings of the 16th Conference on Vision Interface,Halifax,2003:35-42.
    [164]Breu H,Gil J,Kirkpatrick D,Werman M.Linear time Euclidean distance transform algorithms.IEEE Transactions on Pattern Analysis and Machine Intelligence,1995,17:529-533.
    [165]Kalman,R.E.A new approach to linear filtering and prediction problems.Journal of Basic Engineering,1960,82(1):35-45
    [166]Grochow K,Martin S L,Hertzmann A,Popovic Z.Style based inverse kinematics[J].ACM Transactions on Grapics,2004,23(3):522-531.
    [167]邢文训,谢金星.现代优化计算方法[M].2版.北京:清华大学出版社,2005:77-105.
    [168]Muller M,Roder T,Clausen M.Efficient content-based retrieval of motion capture data.ACM Transactions on Graphics 2005,24(3):677-685.
    [169]So CK-F,Baciu G.Entropy-based motion extraction for motion capture animation:motion capture and retrieval.Computer Animation and Virtual Worlds 2005,16(3-4):225-235.
    [170]Harada T,Taoka S,Mori T,Sato T.Quantitative evaluation method for pose and motion similarity based on human perception.In 4th IEEE/RAS International Conference on Humanoid Robots 2004,494-512.
    [171]Lee J,Chai J,Reitsma PSA,Hodgins JK,Pollard NS.Interactive control of avatars animated with human motion data.ACM Transactions on Graphics,2002, 21(3):491-500.
    [172]Kovar L,Gleicher M,Pighin E Motion graphics.ACM Transactions on Graphics,2002,21(3):473-482.
    [173]Arikan O,Forsyth DA.Motion generation from examples.ACM Transactions on Graphics,2002,21(3):483-490.
    [174]Tang JKT,Leung H,Komura T,Shum HPH.Emulating human perception of motion similarity.Computer Animation and Virtual Worlds,2008,19(3-4):211-221.
    [175]Nazll Ikizler,David A.Forsyth.Searching for Complex Human Activities with No Visual Examples.International Journal on Computer Vision,2008,80(3):337-357.