基于人体姿态序列提取和分析的行为识别

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于人体姿态序列提取和分析的行为识别

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Activity Recognition Based on the Extraction and Analysis of Human Pose Sequences
作者：陈聪
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：时空分析 ; 梯度方向局部模式 ; 姿态句子 ; 姿态序列混合特征
英文关键词：Spatial-Temporal Analysis ; Local Pattern of Oriented Gradient ; Pose Sentences ; Pose Aarray Based Mixture Feature
学位年度：2012
导师：闵华清
学科代码：081203
学位授予单位：华南理工大学
论文提交日期：2012-06-01

摘要

从静态图像中估计人体姿态以及从视频中识别人体行为,一直都是计算机视觉和模式识别领域中的研究热点,具有广泛的应用前景。这两个问题都涉及分析人体在图像中的特征，都面临如何消除表象差异、背景杂波、光照阴影等因素的干扰，具有极大的相关性。本文结合这两个问题中的相关技术：利用行为识别中的视频分析技术提高姿态估计的准确率，再利用从视频中提取的姿态序列从不同角度描述人体行为，提高行为识别效果。本文通过在多个公共数据库上进行实验比较，验证本文算法在姿态估计和行为识别的有效性。本文主要成果包括以下几点：
     1.提出基于时空分析的视频前景分割算法。该算法将信号处理中的Gabor滤波用于分析视频中像素的时域变化过程；根据Gabor滤波结果对视频进行粗划分，分别建立描述前景和背景的全局颜色模型和局部颜色模型；最后根据颜色模型对视频进行基于像素级和区域级的双重标记，能够较准确地分割复杂视频中的前景区域。
     2.提出基于姿态句子的人体姿态描述方式。本文首先将人体关节和肢体统一建模为人体的基本部件；再按姿态特征将部件的姿态空间划分为若干个类，每一类构成一个姿态单词，将关联的姿态单词组成姿态句子用于描述全身姿态。本文将LBP运算用于HOG描述子，提出梯度方向局部模式描述子。该描述子在梯度灰度直方图的基础上，比较邻域内不同方向的梯度幅值，提高对于不同人体区域中梯度方向分布变化的敏感性。
     3.本文将消息传播算法用于推断图像中后验概率极大的人体姿态。为了提高模型的训练效率，本文提出流形空间中基于隐支持向量机的模型训练算法。通过将超高维的姿态特征向量投影到较低的流形空间，移除原始特征向量中冗余信息和干扰信息，再在流形空间中采用增量式的隐支持向量机训练模型。本文学习算法能够在不影响估计效果的前景下，大幅度地提高训练效率，减少模型训练所需要的时间和储存空间。
     4.本文提出基于姿态序列混合特征的人体行为识别算法。首先，本文对视频中人体区域像素集的颜色分布、位置特征、光流场等进行聚类分析，获取肢体掩膜并用于提取姿态序列；其次，本文对视频中肢体端点的运动过程建模，利用卡尔曼平滑矫正端点轨迹，平滑姿态序列；再次，本文提出一系列基于姿态序列的行为特征，从不同角度描述人体行为；最后，本文采用混合条件随机场模型学习和识别行为，针对每类特征独立训练隐条件随机场模型，再综合不同模型的识别结果，提高算法的健壮性和抗干扰能力。
Estimating human poses from still images and recognizing human actions from videosare the challenging and widely studied tasks in computer vision and artificial intelligentcommunity. They have a large variant of similar applications such as human-computerinteraction, image retrieval, surveillance, sports video analysis and so on. These two problemsare both concerned with analyzing the feature of the human in the images and suffer theinterference not only from the wide variant of appearance due to skin color, clothes, but alsofrom conditional factors such as background cluster, illusion difference and shadow. Thesetwo problems are associated and we combine the two problems together in this paper: weapply the video analysis technology to improve the human pose estimation performance fromthe videos and employ the poses sequence extracted to describe human actions from multipleviews. The main research problems and contributions are as follows:
     1. A new algorithm is proposed to segment foreground from videos based on theanalysis of the spatial-temporal information in the video. Firstly, by regarding the changeprocess of a single pixel as discrete-time signal, the video is segmented into foreground andbackground in a glancing way by applying the Gabor Filter to the temporal domain. Secondly,global color model and local color model are defined and built by clustering the colorinformation of the background and foreground. Finally, a double-labeling method based onthe pixel level and region level is employed for fine segmentation of the foreground.
     2. We propose the description method of human poses based on the pose-sentences.Believing explicitly applying the feature of joints into pose estimation can improve theperformance, we model limbs and joints as components of human body. As the poses of partsindicate the presence of full-body poses, we split the pose space of each body part into severalclasses called pose-words and apply the combinations of pose-words named pose-sentences todescribe full-body poses. In order to robustly describe the image information, we propose anaive image descriptor called Local Pattern of Oriented Gradient which applies the LBPoperator to the Histogram of orientation. This descriptor is sensitive to the variant of thedistribution of orientation in neighborhood cells and can capture the gradient orientationdifference in different human parts.
     3. We apply the effective message-passing belief propagation method to search the posewith max posteriori in the image. As the dimension of the pose feature vectors are usually tensof thousands, we project the pose feature vectors into a low embedding space to discard thenoise and redundant information. The incremental Latent Support Vector Machine is appliedto train the model in the embedding space. We find the dimension reduction process cansignificantly save the memory space and the train time in the training process while onlyslightly influence the performance. We also investigate into the performances of using severallinear or non-linear dimensional reduction methods to the pose features and find theOrthogonal Linear Graph Embedding outperformances other methods most of the time.
     4. We propose a method for recognizing human actions based on the human posesequences from videos. Given a video, limb masks are extracted by clustering image featuresand motion features in human region and limb masks are helpful to reduce the interferencefrom background and partially address the “double-counting” problem during pose estimation;Then, the extracted pose sequence is smoothed using the Kalman Smooth to remove the noiseand make the changes of pose consistent. Multiple kinds of feature sequences based on theposes information are extracted to describe human actions from different views. At last, weapply the mixture of HCRFs to recognize the human actions: We train one HCRF for eachkind of the feature sequences and combine the conditional probability from different featuresequences to improve the recognition accuracy.
     Experimensts on some public benchmark datasets have proven the efficience of ourproposed methods in the field of Forground Segementation, Pose Estimation and ActionRecognition.

引文

[1]C.Stauffer, W.E.L.Grimson. Adaptive background mixture models for real-time tracking[A]. InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,1999.
    [2]H.L.Eng，K.A.Toh, A.H.Kam，J.Wang, W.Y.Yau. An automatic drowning detection surveillance systemfor challenging outdoor pool environments[A]. In Proceedings of the IEEE International Conference onComputer Vision,2003,pp:532-539.
    [3]H. Wang，D. Suter. A Novel Robust Statistical Method for Background Initialization and VisualSurveillance[A]. In Proceeding sof the Asian Conference on Computer Vision,2006,3851,:328-337.
    [4]A.Elgammal, D.Harwood, L.Davis. Non-parametric model for background subtraction [A]. InProceedings of the Europeon Conference on Computer Vision,2000:751-767.
    [5]M.Heikkila, M.Pietikainen. A texture-based method for modeling the background and detecting movingobjects[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(4),:657-662.
    [6]P.Figueroa, N.Leite, R.M.L.Barros. Tracking soccer players using the graph representation[A]. InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2004,4,:787-790.
    [7]K.Kim, T.H.Chalidabhongse, D.Harwood, L.Davis. Real-time foreground-background segmentationusing codebook model[J]. Real-time imaging,2005,11(3),:172-185.
    [8]M.B.Capellades, D.Doermann, D.DeMenthon, R.Chellappa. An appearance based approach for humanand object tracking[A].In Proceedings of the IEEE International Conference on ImageProcessing,2003,3,:85-88.
    [9]K.Okuma, A.Taleghani, N.D.Freitas, J.J.Little, David G.Lowe. A boosted particle filter: multitargetdetection and tracking[A].In Proceedings of the Europeon Conference on Computer Vision,2004,3021,:28-39.
    [10]N.Robertson, I.Reid. Behaviour understanding in video:a combined method[A]. In Proceedings of theIEEE International Conference on Computer Vision,2005,1,:808-815.
    [11]H.Yi, D.Rajan, L.-T.Chia. A new motion histogram to index motion content in videosegments[J].Pattern Recognition Letters,2004,26,:1221-1231.
    [12]M.Hu, W.Hu, T.Tan. Tracking people through occlusion[A]. In Proceedings of the IEEE InternationalConference on Pattern Recognition,2004,2,:724-727.
    [13]S.Khan, M.Shah. Tracking people in presence of occlusion[A].In Proceedings of the Asian Conferenceon Computer Vision,2000,:1132-1137.
    [14]S.Park, J.K.Aggarwal. Simultaneous tracking of multiple body parts of interacting persons[J].ComputerVision and Image Understanding,2006,102(1),:1-21.
    [15]I.B.Ozer, W.H.Wolf, A hierarchical human detection system in (Un) compressed domains[J].IEEETransactions on Multimedia,2002,4(2),:283-300.
    [16]A Utsumi,, N Tetsutani, Human detection using geometrical pixel value structures [A]. InProceedings of the IEEE International Conference on Automatic Face and Gesture Recognition,2002,:34-39.
    [17] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection[A]. Proc of the IEEEInternational Conference on Computer Vision and Pattern Recognition. Miami, FL, USA. vol.1,2005, pp.886-893.
    [18]L.Davis, V.Philomin, R.Duraiswami. Tracking humans from a moving platform [A]. In Proceedings ofthe IEEE International Conference on Pattern Recognition,2000,4,:171-178.
    [19]A.Koschan, S.Kang, J.Paik, B.Abidi, M.Abidi. Color active shape models for tracking non-rigidobjects[J].Pattern Recognition Letters,2003,24,:1751-1765.
    [20]N.Atsushi, K.Hirokazu, H.Shinsaku, I.Siji. Tracking multiple people using distributed visionsystems[A]. In Proceedings of the IEEE International Conference on Robotics and Automation,2002,3,:2974-2981.
    [21]B.Wu, R.Nevatia. Detection of multiple, partially occluded humans in a single image by bayesiancombination of edgelet part detection[A].In Proceedings of the IEEE International Conference
    [22]Y.A.Ivanov, A.F.Bobick, J.Liu. Fast lighting independent background subtraction[J]. InternationalJournal of Computer Vision,2000,37(2),:199-207.
    [23]A.Mittal, L.S. Davis. M2Tracker: a multi-view approach to segmenting and tracking people in acluttered scene using regionbased stereo[J].International Journal of Computer Vision2003,51(3),:189-203.
    [24]A.Mittal, L.S.Davis. M2Tracker: a multi-view approach to segmenting and tracking people in acluttered scene using regionbased stereo[A].In Proceedings of the Europeon Conference on ComputerVision,2002,2350,:18-33.
    [25]D.B.Yang, H.H.G.Banos, L.J.Guibas. Counting people in crowds with a real-time network of simpleimage sensors[A].In Proceedings of the IEEE International Conference on ComputerVision,2003,1,:122-129.
    [26]S.Iwase, H.Saito, Parallel tracking of all soccer players by integrating detected positions in multipleview images[A]. In Proceedings of the IEEE International Conference on Pattern Recognition,2004,4,:751-754.
    [27] K Grauman, G Shakhnarovich, T Darrell. Inferring3D structure with a statistical image-based shapemodel[A]. In Proceedings of the IEEE International Conference on Computer Vision,2003,3,:641-647.
    [28]A Agarwal, B Triggs. Recovering3D human pose from monocular images[A]. IEEE Transactions onPattern Analysis and Machine Intelligence,2006,28,:44-58.
    [29]M Brand.Shadow puppetry[A].In Proceedings of the IEEE International Conference on ComputerVision,1999,2,:1237-1244.
    [30]M.A. Elgammal, C.S. Lee.Inferring3D body pose from silhouettes using activity manifoldlearning[A].In Proceedings of the IEEE International Conference on Computer Vision,2004,2,:681-688.
    [31]T. Tangkuampien, D. Suter. Real-time human pose inference using kernel principal componentpre-image approximations[A]. In Proceedings of the British Machine Vision Conference,2006,2,:599–608.
    [32]R. Okada, S. Soatto.Relevant Feature Selection for Human Pose Estimation and Localization inCluttered Images[A].In Proceedings of the Europeon Conference on Computer Vision,2008,5303,:434-445.
    [33]N. Ikizle, R.G. Cinbis, S. Sclaroff. Learning Actions From theWeb[A].In Proceedings of the IEEEInternational Conference on Computer Vision,2009,:995-1002.
    [34]L. Taycher, G. Shakhnarovich, D. Demirdjian, T. Darrell. Conditional random people: Tracking humanswith crfs and grid filters[A].In Proceedings of the IEEE International Conference on Computer Vision andPattern Recognition,2006,1,:222-229.
    [35]H. Ning, W. Xu, Y. Gong, and T. Huang. Latent Pose Estimator for Continuous ActionRecognition[A].In Proceedings of the Europeon Conference on Computer Vision,2008,5303,:419-443.
    [36]P Viola,，M Jones Rapid. object detection using a boosted cascade of simple features[A]. InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2001,1,:511-518.
    [37]T.B.Moeslund, E.Granum. Motion capture of articulated chains by applying auxiliary information tothe sequential Monte Carlo algortihm[A].In Proceedings of International Conference onVisualization,Imaging and Image Processing,2004,1,:13-15.
    [38]T.J. Roberts, S.J. McKenna, I.W. Ricketts. Human pose estimation using learnt probabilistic regionsimilarities and partial configurations[A].In Proceedings of the Europeon Conference on Computer Vision,2004,3024,:291-303.
    [39]R.Ronfard, C.Schmid, B.Triggs. Learning to parse pictures of people[A].In Proceedings of theEuropeon Conference on Computer Vision,2002,3024,:291-303.
    [40]A.Micilotta, E.Ong, R.Bowden. Detection and tracking of humans by probabilistic body partassembly[A].In Proceedings of the British Machine Vision Conference,2005,2353,:700-714.
    [41]K. Mikolajczyk, D.Schmid, A.Zisserman. Human detection based on a probabilistic assembly of robustpart detectors[A].In Proceedings of the Europeon Conference on Computer Vision,2008,5303,:419-443.
    [42]D.Ramanan, D.A.Forsyth, A.Zisserman.Strike a pose:tracking people by finding stylizedposes[A].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,1,:271-278.
    [43]S.Johnson, M.Everingham. Combining discriminative appearance and segmentation cues for articulatedhuman pose estimation[A].Workshops of the IEEE International Conference on Computer Vision,2009,:405-412.
    [44]X.Ren, A.Berg, J.Malik. Recovering human body configurations using pairwise constraints betweenparts[A]. In Proceedings of the Europeon Conference on Computer Vision,2005,1,:824-831.
    [45]P.Felzenszwalb and D.Huttenlocher. Pictorial Structures for Object Recognition [A].InternationalJournal of Computer Vision,2005,61(1),:55-79.
    [46]D.Ramanan. Learning to parse images of articulated bodies [J]. Advances in Neural InformationProcessing Systems,2007.
    [47]J.Carranza, C.Theobalt, M.Magnor, H..P.Seidel. Free-viewpoint video of human actors [J]. ACMSIGGRAPH,2002,22(3).
    [48]M.W.Lee, I.Cohen. Proposal maps driven MCMC for estimating human body pose in static images[A].In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2004,2,:334-341.
    [49]E.J.Ong, A.Hilton. Learnt inverse kinematics for animation synthesis[A].In Proceedings of Conferenceon Vision, Video and Graphics,2005,23(3),:472–483.
    [50]K. Grochow, S.L. Martin, A. Hertzmann, Z. Popovic. Style-based inverse kinematics[J].ACMTransactions on Computer Graphics,2004,vol22(3),:569.
    [51]J.Chen, M.Kim,Y. Wang, Q.Ji. Switching Gaussian Process Dynamic Models for simultaneouscomposite motion tracking and recognition[A]. In Proceedings of the IEEE International Conference onComputer Vision and Pattern Recognition,2009,:2655-2662.
    [52]H.Sidenbladh, M.J.Black. Learning the statistics of people in images and video [J].International Journalof Computer Vision,2003,vo54(3),:183-209.
    [53]I.A.Karaulova, P.M.Hall, A.D.Marshall. A hierarchical models of dynamics for tracking people with asingle video camera[A]. In Proceedings of the British Machine Vision Conference,2000.
    [54]A.F.Bobick, J.W.Davis. The recognition of human movement using temporal templates[J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2001,23(3),pp:257-267.
    [55]T.F.Syeda-Mahmood, M.Vasilescu, S.Sethi. Recognizing action events from multiple viewpoints[A].IEEE Workshop on Detection and Recognition of Events in Video,2001,:64-72.
    [56]Z. Lin, Z. Jiang, D.S. Larry. Recognizing actions by shape-motion prototype trees[A].In Proceedings ofthe IEEE International Conference on Computer Vision,2009,:444-451.
    [57]Z.Zhang, Y.Hu, S.Chan, L.T. Chia. Motion Context: A New Representation for Human ActionRecognition[A].In Proceedings of the Europeon Conference on Computer Vision,2008,5305,,:817-829.
    [58]D. Weinland, R. Ronfarda, E. Boyera. Free viewpoint action recognition using motion historyvolumes[J]. Computer Vision and Image,2006,104(2),:249-257.
    [59]A.Yilmaz,M.Shah. Actions sketch: A novel action representation[A]. In Proceedings of the IEEEInternational Conference on Computer Vision and Pattern Recognition,2005,1,:984-989.
    [60]M.Ahmad, S.W.Lee. Human action recognition using shape and CLG-motion flowfrommulti-viewimage sequences[J].The Journal of Pattern Recognition Society,2008,41,:2237-2252.
    [61]R.A.Young, R.M.Lesperance, W.W.Meyer. The Gaussian derivative model for spatial temporal vision: I.Cortical model[J].Spatial Vision,2001,14,:261-319.
    [62]H.Jhuang, T.Serre, L.Wolf, and T.Poggio. A biologically inspired system for action recognition[A].InProceedings of the IEEE International Conference on Computer Vision,2007,:1-8.
    [63]O.Chomat,J.L.Crowley. Probabilistic recognition of activity using local appearance[A]. In Proceedingsof the IEEE International Conference on Computer Vision and Pattern Recognition,1999,2,:104-109.
    [64]L.Zelnik-Manor,M.Irani. Event-based analysis of video[A]. In Proceedings of the IEEE InternationalConference on Computer Vision and Pattern Recognition,2001,2,:123-130.
    [65]I.Laptev. On space-time interest points [J].IEEE Transactions on Computer Vision,2005,64,:107-123.
    [66]M. Marszalek,I.Laptev., C.Schmid. Actions in Context[A]. In Proceedings of the IEEE InternationalConference on Computer Vision and Pattern Recognition,2009,:2929-2936.
    [67]P.Dollr, V.Rabaud, G.Cottrell, S.Belongie. Behavior recognition via sparse spatio-temporal features[A].In Proceedings of the IEEE International Conference onVisual Surveillance and Performance Evaluation ofTracking and Surveillance,2005,:65-72.
    [68]G.Willems, T.Tuytelaars,L.Van Gool. An efficient dense and scale-invariant spatio-temporal interestpoint detector[A].In Proceedings of the Europeon Conference on Computer Vision,2008,5303,:650-663.
    [69]N.Oliver,B.Rosario,A.Pentland. A Bayesian computer vision system for modeling humaninteractions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8),:831-843
    [70] D.J. Moore, I.A.Essa,MH Hayes. Exploiting human actions and object context for recognitiontasks[A]. In Proceedings of the Seventh IEEE International Conference on Computer Vision,1999,1,:80-86.
    [71] S. Hongeng, R. Nevatia. Large-scale event detection using semi-hidden Markov models[A]. InProceedings of the Seventh IEEE International Conference on Computer Vision,2003,:1455-1462.
    [72] S.Luhr, HH Bui, S Venkatesh. West GAW. Recognition of human activity through hierarchicalstochastic learning [A]. In Proceedings of the First IEEE International Conference on Pervasive Computingand Communications, Fort Worth, Texas, USA,2003,:416-422.
    [73] NT Nguyen, DQ Phung, S Venkatesh, H Bui. Learning and detecting activities from movementtrajectories using the hierarchical hidden Markov model[A]. In Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, San Diego, CA, USA,2005(2),:955-960.
    [74] H Ren, GY Xu. Human action recognition with primitive-based coupled-HMM [A]. In Proceedings ofthe16th International Conference on Pattern Recognition,2002(2),:494-498.
    [75]C Sminchisescu, A Kanaujia, Z Li. Conditional models for contextual human motion recognition[A].Tenth IEEE International Conference on Computer Vision, Washington, DC, USA,2005,:1808–1815.
    [76]T Wang, JiG Li, Q Diao, W Hu. Semantic Event Detection using Conditional Random Fields[A]. IEEEConference on Computer Vision and Pattern Recognition Workshop,2006,:109-109
    [77]褚一平，张引，叶修梓，张三元.基于隐条件随机场的自适应视频分割算法[J].自动化学报.2007,33(12),:1252-1258.
    [78] I Junejo, E Dexter, I Laptev, P Perez. View-Independent Action Recognition from TemporalSelf-Similarities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,99,:1-1
    [79] Y Wang G Mori. Learning a Discriminative Hidden Part Model for Human ActionRecognition[J].Advances in Neural Information Processing Systems,2008,
    [80] P Remagnino, T Tan, K Baker. Agent orientated annotation in model based visual surveillance[A].Sixth International Conference on Computer Vision,1998,:857-862
    [81] S Intille,A Bobick. A framework for recognizing multiagent action from visual evidence[A]. InProceedings of the National Conference on Artificial Intelligence,1999,518–525.
    [82] A Gupta, LS Davis. Objects in Action: An Approach for Combining Action Understanding and ObjectPerception[A]. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota,USA,2007,:1-8
    [83] S.G Gong, T Xiang. Recognition of group activities using dynamic probabilistic networks[A]. InProceedings of Ninth IEEE International Conference on Computer Vision,2003(2).,742-749
    [84]Y Luo, T Wu, J Hwang. Object-based analysis and interpretation of human motion in sports videosequences by dynamic bayesian networks[J] Compute Vision and Image
    [85]C.Bregler. Learning and recognizing human dynamics in video sequences[A]. In Proceedings of theIEEE International Conference on Computer Vision and Pattern Recognition,1997,:568-574.
    [86]B.North, A.Blake, M.Isard, J.Rittscher. Learning and classification of complex dynamics[J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2000,22(9),:1016-1034.
    [87]V.Pavlovic, J.M.Rehg, J.MacCormick. Learning switching linear models of human motion[J].Advances in Neural Information Processing Systems,2000,:981-987.
    [88]V. Pavlovic and J. M. Rehg, Impact of dynamic model learning on classification of human motion[A].In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2000,:1788-1795.
    [89] Y Li, TS Wang, H Y Shum. Motiont exture: a two levelst atistical model for charater motionsynthesis[J]. ACM Transactions on Graphics,2002,21(3),:465-472。
    [90]M.S.Ryoo, J.K.Aggarwal. Recognition of composite human activities through context-free grammarbased representation[A]. In Proceedings of the IEEE International Conference on Computer Vision andPattern Recognition,2006,:1709-1718.
    [91]D.Moore,I. Essa. Recognizing multitasked activities using stochastic context-free grammar. in IEEEInternational Conference on Computer Vision and Pattern Recognition Workshop on Models vs Exemplarsin Computer Vision,2001
    [92]M.Yamamoto, H.Mitomi, F.Fujiwara. Bayesian classification of task-oriented actions based onstochastic context-free grammar[A]. In Proceedings of the Conference on CConference on Auto-matic Faceand Gesture Recognition,2006,:317-323.
    [93]D.Minnen, I.Essa, T.Starner.Expectation grammars: leveraging high-level expectations for activityrecognition[A]. In Proceedings of the IEEE International Conference on Computer Vision and PatternRecognition,2003,2:626-632.
    [94]A.Ogale, A.Karapurkar, Y. Aloimonos. View-invariant modeling and recognition of human actionsusing grammars[A]. In IEEE Workshop on Dynamical Vision,2005.
    [95]D.Lymberopoulos, A.S.Ogale, A.Savvides, Y. Aloimonos. A sensory grammar for inferring behaviors insensor networks[A]. In Proceedings of the IEEE International Conference on Information processing insensor networks,2006,:251-259.
    [96]Z.Zhang, K.Huang, T.Tan. Multi-thread parsing for recognizing complex events in videos.[A].InProceedings of the Europeon Conference on Computer Vision,5304,:738-751.Cputer Vision,.
    [97]S.W.Joo, R. Chellappa. Attribute grammar-based event recognition and anomaly detection[A]. InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2006,:107-107.
    [98]A.Borzin, E.Rivlin, M.Rudzsky. Surveillance interpretation using Generalized Stochastic PetriNets[A]. In Proceedings of the IEEE International Conference on Image Analysis for MultimediaInteractive Services,2007,:390-397.
    [99]N.M.Ghanem. Petri Net models for event recognition in surveillance videos. Ph.D. dissertation,University of Maryland,2007.
    [100]M.Albanese, V.Moscato, R.Chellappa, A.Picariello. A constrained probabilistic petrinet framework forhuman activity detection in video[J]. IEEE Transactions on Multimedia,2008,:982-996.
    [101]Y..Nam, N.Wohn, H.Lee-Kwang. Modeling and recognition of hand gesture using colored PetriNets[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part A,1999,29(5),:514-521.
    [102] P. KaewTraKulPong，R.Bowden. An improved adaptive background mixture model for real-timetracking with shadow detection[A]. Proc of the2nd European Workshop on Advanced Video-BasedSurveillance Systems. Kingston，UK，2001：149-158.
    [103] A Elgammal，R Duraiswami，D Harwood，etal. Background and foreground modeling usingnonparametric kernel density estimation for visual surveillance[A]. In Proceedings of the IEEE，2002，90(7)：1151-1163
    [104] J Migdal，L W Eric. Background subtraction using markov thresholds[A]. IEEE Workshop onMotion and Video Computing. Breckenridge，CO，USA，2005：58-65
    [105] S Yaser， S Mubarak. Bayesian modeling of dynamic scenes for object detection[J]. IEEETransactions on Pattern Analysis and Machine Intelligence，2005，27(11)：1778-1792
    [106] HSM Beigi, SH Maes. A hierarchical approach to large-scale speaker recognition[A].Proc of the6thEuropean Conference on Speech Communication and Technology. Budapest，Hungary，1999：2203-2206
    [107]陈成，庄越挺，肖俊.相机运动条件下的视频前景提取[J].浙江大学学报（工学版），2009，43(6):973-982.
    [108] K Fukunaga，L Hostetler. The Estimation of the Gradient of a Density Function, with Applications inPattern Recognition[J]. IEEE Transactions on Information Theory，1975，21(1)：32-40
    [109] YZ Cheng. Mean shift, mode seeking, and clustering[J]. IEEE Transactions on Pattern Analysis andMachine Intelligence，1995，17(8):790-799
    [110] D Comaniciu，M.P. Peter. Mean Shift: A robust approach toward feature space analysis[J]. IEEETransactions on Pattern Analysis and Machine Intelligence，2002，24(5)：602-619
    [111] M.Blank，L.Gorelick，E Shechtman，etal. Actions as space-time shapes[J]. IEEE Transactions onPattern Analysis and Machine Intelligence，2007，29(12)：2247-2253
    [112] P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition[J]. InternationalJournal of Computer Vision,61(2005):55-79.
    [113] S. Leonid, B. Michae J, Predicting3D people from2D pictures[A]. Proc of4th InternationalConference on Articulated Motion and Deformable Objects. Mallorca, Spain,2006, pp.185-195.
    [114] S. Leonid, B. Michael J, Measure Locally, Reason Globally: Occlusion-sensitive Articulated PoseEstimation[A]. Proc of IEEE Conference on Computer Vision and Pattern Recognition. New York, NY, US,2006, pp.2041-2048.
    [115] F. Vittorio, M. Manuel, Z. Andrew, Progressive search space reduction for human pose estimation[A].Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Anchorage, AK,vol2,2008, pp.1-8.
    [116] F. Vittorio, M. Manuel, Z. Andrew, Pose Search: retrieving people using their pose[A]. Proc of theIEEE International Conference on Computer Vision and Pattern Recognition. Miami, FL, USA.2009, pp.1-8.
    [117] Z Long, C Yuanhao, L Yifei, L Chenxi, Y Alan. Max Margin AND/OR Graph Learning for Parsingthe Human Body[A]. Proc of the IEEE International Conference on Computer Vision and PatternRecognition. Anchorage, AK,2008, pp.1-8.
    [118] W. Yang, M. Greg, Multiple tree models for occlusion and spatial constraints in human poseestimation[A]. Proc of10th European Conference on Computer Vision. Marseille, France.2008, pp.710-724.
    [119] J. Sam, E. Mark, Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation[A].Proc of the21st British Machine Vision Conference. Aberystwyth,Wales, UK,2010, pp.434-441.
    [120] J. Sam, E. Mark, Learning Effective Human Pose Estimation from Inaccurate Annotation[A]. Procof the IEEE International Conference on Computer Vision and Pattern Recognition. Colorado Springs,USA.2011, pp.1465-1472.
    [121]Y. Yang, R. Deva. Articulated pose estimation with flexible mixtures-of-parts[A]. Proc of the IEEEInternational Conference on Computer Vision and Pattern Recognition. Colorado Springs，CO, Unitedstates.2011, pp.1385-1392.
    [122] K. Paul, M. Dimitrios, N. Jean-Christophe, Integration of Bottom-Up/Top-Down Approaches for2DPose Estimation Using Probabilistic Gaussian Modelling[J]. Computer Vision and Image Understanding,115(2),2011, pp:242-255.
    [123] X. Lan and D. Huttenlocher. Beyond trees: Common-factor models for2D human pose recovery[A].In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, vol1,2005, pp:470–477.
    [124] M. Eichner and V. Ferrari, Better appearance models for pictorial structures[A]. Proc of the20stBritish Machine Vision Conference. Rama Chellappa, UK,2009
    [125] L. Zhe, D. Larry S, D. David, D. Daniel. An Interactive Approach to Pose-Assisted andAppearance-based Segmentation of Humans[A]. Proc of11th International Conference on Computer Vision.Rio de Janeiro, Brazil.2007, pp.1-8.
    [127] H. Michael, G. Dariu M.,3D Human model adaptation by frame selection and shape–textureoptimization[J]. Computer Vision and Image Understanding,115(11),2011,:1559-1570.
    [128] K. Roland, G. Luc Van, Markerless tracking of complex human motions from multiple views[J].Computer Vision and Image Understanding,104(2-3),2006:190-209.
    [129] M. Andriluka, S. Roth, and B. Schiele Pictorial structures revisited: People detection and articulatedpose estimation[A]. Proc of the IEEE International Conference on Computer Vision and PatternRecognition. Miami, FL, USA.2009, pp.1014-1021.
    [130] P, Buehler.,M, Everingham,.D.P Huttenlocher,. and Zisserman, Long term arm and hand trackingfor continuous sign language TV broadcasts[A]. Proc of the19st British Machine Vision Conference. Leeds,UK,2008, pp.265-274
    [131] SB Wang，A Quattoni，LP Morency. Hidden Conditional Random Fields for Gesture Recognition[A].IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2006,:1521-1527.
    [132] G David. Lowe, Object recognition from local scale-invariant features[A]. International Conferenceon Computer Vision, Corfu, Greece,1999, pp.1150-1157.
    [133] S Belongie, J Malik and J Puzicha. Shape Matching and Object Recognition Using Shape Contexts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(4):509-552.
    [134] T. Duan, F. D.A,. Configuration estimates improve pedestrian finding[A]. Proc of Neural InformationProcessing Systems. Vancouver, British Columbia, Canada.2007, pp.1529–1536.
    [135] S. V. Kumar, N. Ram, H. Chang, Efficient inference with multiple heterogeneous part detectors forhuman pose estimation[A]. Proc of11th European Conference on Computer Vision. Heraklion, Crete,Greece.2010, pp.314-327.
    [136] F. Pedro F, G. Ross B., M. David, R. Deva, Object Detection with Discriminatively TrainedPart-Based Models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,32(9)2010,pp:1627-1645.
    [137] H. M.B., M. T.B., F. P., View-invariant gesture recognition using3D optical flow and harmonicmotion context[A]. Computer Vision and Image Understanding,114(12),2010, pp.1353-1361.
    [138] C. Cheng, Y. Y, N. Feiping, O. Jean-Marc,3D human pose recovery from image by efficient visualfeature selection[J]. Computer Vision and Image Understanding,115(3),2011,pp:290-299.
    [139] A. Marijke F, C. Laura E,. Performance evaluation of texture measures with classification based onKullback discrimination of distributions[A]. Proc of12th International Conference on Pattern Recognition.Vienna, Austria.1994, pp.582-585.
    [140] XY Wang, T X. Han and SC Yan, An HOG-LBP Human Detector with Partial Occlusion Handling[A].IEEE International Conference on Computer Vision, Kyoto,2009.
    [141] X. He, P. Niyogi, Locality Preserving Projections[A]. Proc of16th Advances in Neural InformationProcessing Systems. Vancouver, Canada.2003, pp.153-160.
    [142] C. Deng, H. Xiaofei, H. Jiawei, Document Clustering Using Locality Preserving Indexing[A]. IEEETransactions on Knowledge and Data Engineering,17(12)(2005)1624-1637.
    [143] C. Deng, H. Xiaofei, Orthogonal Locality Preserving Indexing[A]. The28th Annual InternationalACM SIGIR Conference. Salvador, Brazil.2005.pp.522-536.
    [144] C. Deng, H. Xiaofei, H. Jiawei, Z. Hong-Jiang, Orthogonal Laplacianfaces for Face Recognition[J].IEEE Transactions on Image Processing,15(11)(2006)3608-3614.
    [146] T. Duan, F. David, Improved human parsing with a full relational model[A]. Proc of11th EuropeanConference on Computer Vision, Heraklion, Crete, Greece.2010, pp.227-240.
    [147] AA Efros, A.C Berg,;G.Mori,.J.Malik,. Recognizing action at a distance [A]. IEEE InternationalConference on Computer Vision,2003, pp.726-733.
    [148]C. Niebles, H. Wang and F-F Li.. Unsupervised learning of human action categories usingspatial-temporal words[J]. Internation Jonuary of compute vision, Vol.79.No.3,2008, pp:299-318,.
    [149] S.,Saad Paul, S..A,Mubarak.3-dimensional sift descriptor and its application to actionrecognition[A].15th international conference on Multimedia.2007.
    [150]A Klaser,M Marsza ek.,C Schmid. A spatio-temporal descriptor based on3D-gradients[A].Proceedings of the British Machine Vision Conference,2006,2,:599–608, Leeds, UK.2008, pp.1-4.
    [151] Y.L X; H. Chang; Z Li; L.H Liang; X.L. Chen; D. Zhao. A unified framework for locating andrecognizing human actions[A]. In Proceedings of the IEEE International Conference on Computer Visionand Pattern Recognition,2011, pp:25-32.
    [152] F. Vittorio. M. Manue.A. Andrew.2D Human Pose Estimation in TV Shows[A].. Digital Seminar onStatistical and Geometrical Approaches to Visual Motion Analysis,2009.
    [153] C Thurau.;V. Hlavac. Pose primitive based human action recognition in videos or still images[A]. InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2008, pp:1-8, June.
    [154] S.Ali. and M Shah. Human action recognition in videos using kinematical features and multipleinstance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.32, No.2,2010,pp:288-303.
    [155] M Bregonzio; S.G Gong; X. Tao. Recognizing action as clouds of space-time interest points[A]. InProceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2009, pp:1948-1955, June
    [156] Z Zhang., Y Hu, S Chan.. Motion context: Anew representation for human action recognition[A]. InProceedings of the European Conference on Computer Vision, Vol.4.2008, pp:817–829.
    [157] D.Weinland., E. Boyer. Action recognition using exemplar-based embedding[A]. In Proceedings ofthe IEEE International Conference on Computer Vision and Pattern Recognition.2008, pp:1-8.
    [158] B. Sapp, C. Jordan, and B. Taskar. Adaptive pose priors for pictorial structures[A]. In Proceedings ofthe IEEE International Conference on Computer Vision and Pattern Recognition,2010, pp.422–429.
    [159]B. Sapp, A. Toshev, and B. Taskar. Cascaded Models for Articulated Pose Estimation[A]. InProceedings of the European Conference on Computer Vision,2010,pp406–420.
    [160] X. He, Shuicheng Y., Yuxiao H., Partha N.,Hong-Jiang Z. Face Recognition Using Laplacianfaces[J].IEEE Transaction on Pattern Analysis and Machine Intelligence,27(3),2005,pp.328-340.
    [161]C. Deng, H. Xiaofei, H. Jiawei, Z. Hong-Jiang, Orthogonal Laplacianfaces for Face Recognition[J].IEEE Transactions on Image Processing,15(11).2006.pp:3608-3614.
    [162] C. Deng, H. Xiaofei, H. Jiawei, Spectral Regression for Efficient Regularized Subspace Learning[A].Proc of the IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil,2007.
    [163] W.J.Krzanowski, Principles of Multivariate Analysis: A User's Perspective[M]. New York: OxfordUniversity Press,1988.
    [164] X.F He, D. Cai, S.C. Yan, and H.J. Zhang, Neighborhood Preserving Embedding,10thIEEEInternational Conference on Computer Vision,2005
    [165] D. Cai, X.F. He, J. Han,"SRDA: An Efficient Algorithm for Large Scale Discriminant Analysis[A].IEEE Transactions on Knowledge and Data Engineering,2007.
    [166] C. Rother, V. Kolmogorov, and A. Blake, GrabCut: Interactive foreground extraction using iteratedgraph cuts [J]. ACM Transactions. Graph., vol.23,2004, pp.309–314.
    [167]H.Spath. Cluster Dissection and Analysis: Theory, FORTRAN Programs, Examples. Translated by J.Goldschmidt [M]. New York: Halsted Press,1985
    [168] M. Blank L. Gorelick E. Shechtman M. Irani R. Basri. Actions as Space-Time Shapes [A]. IEEEInternational Conference on Computer Vision.2005.
    [169] S. B. Wang, A. Quattoni, L.-P. Morency, D. Demirdjian. Hidden Conditional Random Fields forGesture Recognition[A]. In Proceedings of the IEEE International Conference on Computer Vision andPattern Recognition.2006.
    [170] A. Quattoni, S. Wang, L-P. Morency, M. Collins, T. Darrell, Hidden-state Conditional RandomFields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,29(10),2007, pp.1848-1852.
    [171]刘法旺,贾云得.基于流形学习与隐条件随机场的人体动作识别[J].软件学报，vol.19(10),2008.pp:69-77..
    [172] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham (1995). Active shape models-their training andapplication [J]. Computer Vision and Image Understanding (61):38–59.
    [173] A. Fathi and G. Mori. Action recognition by learning mid-level motion features [A]. In Proceedingsof the IEEE International Conference on Computer Vision and Pattern Recognition,2008.
    [174] J. Lafferty., A. McCallum, F. Pereira,.. Conditional random fields: Probabilistic models forsegmenting and labeling sequence data[A]. In Proceedings of18th International Conf. on MachineLearning. Morgan Kaufmann.2001, pp.282–289.
    [175] A. Bruhn, J.Weickert and C. Schn¨orr. Lucas/Kanade meets Horn/Schunk: combining local andglobal optical flow methods. International Journal of Computer Vision (IJCV),61(3):211–231,2005.
    [176] J., Imran N., D. Emilie, I. Laptev, P. Patrick. View-independent action recognition from temporalself-similarities[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.33(1),pp,2011:172-185.
    [177] S. Ali, B. Arslan, S. Mubarak. Chaotic invariants for human action recognition[A].In Proceedings ofthe IEEE International Conference on Computer Vision,2007.
    [178] http://mocap.cs.cmu.edu/

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700