用户名: 密码: 验证码:
基于计算机视觉的人体动作检测和识别方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
人体动作检测和识别作为人体运动分析研究的重要内容,在智能监控、人机接口、基于内容的视频检索和图像压缩等领域有着广阔的应用前景和潜在的经济价值和社会价值,正受到越来越多研究学者的关注。由于人体动作属于非刚性运动,在不同条件下,同一种动作类型本身变化差异较大,然而不同动作类型之间变化差异较小,因此人体动作检测和识别面对的困难往往比其它目标检测和识别要大。
     本文系统地研究了如何在视频中检测和识别出人体动作的问题。根据人体动作检测和识别过程以及实际应用的需要,本文研究的问题包括:(a)室内环境下的人体检测;(b)复杂背景下的目标跟踪;(c)遮挡情况下的目标跟踪;(d)运动摄像机下的人体动作检测和识别; (e)动态背景下的人体动作检测和识别。从这些问题出发,本文提出了相应的解决方法,主要研究内容和创新点归纳如下:
     1.由于人体本身容易受自身或其它物体的遮挡、人体观测角度以及人体肤色差异的影响,加上随着广角镜摄像机的逐步普及,利用人体颜色,形状等特征的方法很可能失效。针对这些问题以及系统可扩展的需求,提出了一种基于黑板模式的室内人体检测算法。该算法通过将其它室内非人体目标排除来实现人体检测。通过增加或者删除知识源(处理模块),就可以实现算法的改进,改善系统的扩展性。实验结果表明基于黑板模式的人体检测方法是有效的。
     2.针对复杂背景的情形,提出了一种基于非参数聚类和多尺度图像的目标跟踪方法。该算法首先通过利用改进的非参数聚类自动确定颜色直方图的位数,利用确定的直方图位数以及高斯函数来建模目标结构信息,并定义目标外观模型。然后依据Bhattacharyya系数定义推导了目标模型与候选目标模型的相似性函数。另外通过利用金字塔图像进行从粗到细的目标空间定位。最后通过最大化一个对数似然函数的下界得到最优核函数带宽,实现目标尺度定位。实验结果表明该方法优于典型的均值漂移算法。另外针对跟踪过程存在遮挡的情形,提出了一种基于人体检测和改进均值漂移算法的多目标跟踪方法。解决遮挡问题关键就是如何将遮挡前的可靠轨迹和遮挡后的临时轨迹连接起来。文中定义了一个利用目标外观、尺度和位置信息的连接似然函数,并利用Hungarian算法得到最优连接组合。实验结果表明该方法有效。
     3.针对运动摄像机和动态背景的情形,提出了一种基于形状-运动特征的人体动作基元树的人体动作识别方法。在训练过程中,通过k-均值聚类得到人体动作基元,然后利用层次k-均值聚类方法建立树模型,将人体动作基元保存到各个叶结点中去。测试过程中,首先检测出人体,并通过利用外观信息进行跟踪得到人体的大概位置,然后利用一个联合概率优化过程细化人体的位置和识别与当前帧对应的动作基元。最后通过动态时间规整算法识别人体动作。另外提出了基于HMM的图像帧到动作基元匹配方法,并与基于树模型的图像帧到动作基元匹配方法作实验比较。实验结果为在Keck手势数据库中达到91.07%的识别率,Weizmann动作数据库中达到100%的识别率,KTH动作数据库中达到95.77%和在结算柜台数据库中达到99.23%的识别率。
     4.针对运动摄像机和动态背景的情形,提出一种基于动作基元判别树模型的集成人体动作检测,识别和分割方法。训练过程中通过k-均值聚类学习得到人体动作基元,然后建立人体动作基元树模型。每个树结点具有一个用于训练和测试过程中快速匹配的截止阈值,每个叶结点还另外包括参数集:匹配到该叶结点的训练图像帧索引和动作类型分布概率。测试过程,首先将从滑动窗口中计算得到的特征描述符快速匹配到学习的树模型中,得到人体动作的初始位置,然后通过一种全局滤波的方法来修正每一幅图像帧的人体动作位置。人体动作的识别过程是通过一个动作类型和动作基元的联合概率之和最大化过程得到。通过使用叶结点保存的帧索引计算得到分割掩膜实现人体动作分割。实验结果为在CMU动作数据库和Weizmann动作数据库中达到100%的识别率。
     5.针对运动摄像机和动态背景的情形,这里提出了一种基于判别霍夫表决树的多类型人体动作检测和识别方法。训练过程中,首先利用局部运动、外形特征建立一对定位树,对全局联合hog-flow特征运用层次标签一致k-均值聚类方法建立一棵识别树。每个树结点保存着特征类型分布函数,而对于定位树模型来说,每个树结点还保存着相对目标中心位移。测试过程中,首先访问定位树并运用局部特征表决得到一小部分最有可能包含人体动作的位置,然后对这些潜在位置提取全局特征,访问识别树并运用全局特征表决来识别人体动作。实验结果表明该方法优于现有方法在Keck手势数据库,CMU动作数据库以及KTH动作数据库中的测试结果。
Human action detection and recognition is considered as an important research of human movement analysis, which is receiving more and more attention in computer vision due to its large potential applications such as smart surveillance, human-computer interaction, content-based video retrieval and image compression and to significant societal and economic value. Because of the non-rigid nature of human movements, the inter-class variations are usually large while the extra-class variations are small under different conditions. It is thus more difficult to detect and recognize human actions than other objects from videos.
     This thesis focuses on human action detection and recognition from videos. According to the detection and recognition process of human actions and requirements from practical application, the research content of this thesis includes: (a) human detection in indoor environments; (b) target tracking in complex scenarios; (c) target tracking under occlusion; (d) human action detection and recognition from a moving camera; (e) human action detection and recognition with a dynamic background. Motivated by these issues, the corresponding solutions are proposed in this thesis. The main content and novelties of this thesis are summarized as follows:
     1. Due to the self-occlusions and occlusions from other objects, different view points of observation, variations of skin color from different persons, and increasing usage of wide-angle cameras, the human detection algorithm which relys on color and shape cues may fail. Motivated by these issues and by the needs of systematic extension, a blackboard-based algorithm for human detection in indoor environments is proposed. The human detection is performed by the exclusion of other indoor nonhuman objects. The improvements of this algorithm are achieved by adding or subtracting some modules of knowledge source, which is good for the extension of human detection system. Experimental results show that the blackboard-based method for human detection is feasible.
     2. A target tracking algorithm based on nonparametric clustering and multi-scale images is presented and applied in complex backgrounds. Firstly, a modified non-parametric color-clustering method is employed to automatically partition the color space of a tracked object, and the Gaussian function is used to model the spatial information of each bin of the color histogram. The appearance model of target is defined. Next, the Bhattacharyya coefficient is employed to derive a function describing the similarity between the target model and the target candidate. Then, a coarse-to-fine approach of multi-scale images is employed to implement the spatial localization of the tracked object. Finally, the optimal bandwidth of kernel function is obtained by the maximization of the lower bound of a log-likelihood function and is used to estimate the scale of the tracked object. Experimental results show that the proposed algorithm outperforms the classical mean shift tracker. A multi-person tracking algorithm based on the human detection and improved mean shift tracking algorithm is presented for dealing the occlusion issues occurring in the tracking process. The key point of solving occlusion issues is to associate the reliable tracks before occlusion, with the temporary tracks after occlusion. An association likelihood based on the appearance, size and location information of the tracked object is defined in this paper. The optimal association is computed using the Hungarian algorithm. The experimental results show that the tracking algorithm is effective.
     3. In order to handle the cases such as a moving camera or dynamic backgrounds, an approach based on shape-motion action prototype tree is introduced for action recognition. During training, action prototypes are learned via k-means clustering. A binary prototype tree is constructed via hierarchical k-means clustering using the set of learned action prototypes. The action prototypes are stored in the leaf nodes of the binary prototype tree. During testing, humans are first detected and tracked using appearance information to get rough location of an actor. Then the joint probability optimization is performed to refine the location of the actor and identify the corresponding prototype. Then actions are recognized based on dynamic time warping. A HMM-based frame-to-prototype matching scheme is introduced and compared with the tree-based frame-to-prototype matching scheme. Experimental results demonstrate that our approach achieves recognition rates of 91.07% on the Keck gesture dataset, 100% on the Weizmann action dataset, 95.77% on the KTH action dataset and 99.23% on the Checkout Counter Dataset.
     4. A tree-based approach to integrate action detection, recognition and segmentation is proposed for the moving camera and dynamic background. During training, a set of action prototypes is first learned based on k-means clustering, and then a binary tree model is constructed. Each tree node stores a rejection threshold learned for fast matching during training and testing. Each leaf node also includes a list of learned parameters: the frame indices of training descriptors that best match with the leaf node and an action class distribution. During testing, an action is first localized by matching the feature descriptors extracted from sliding windows to the learned tree using a fast matching method, and then followed by global filtering to refine the location of the action. The action is recognized by maximizing the sum of the joint probabilities of the action category and action prototype over test frames. The action is segmented by using the segmentation mask computed by the stored frame indices of training data in the matched leaf nodes. Experimental results show that our approach can achieve recognition rates of 100% on the CMU action dataset and 100% on the Weizmann dataset.
     5. A discriminative tree-based Hough voting technique for multiclass action detection and recognition is proposed to address the moving camera and dynamic background issues. During training, a pair of localization trees using local motion and appearance features is learned, and a recognition tree using joint hog-flow features is learned via hierarchical label consistent k-means clustering algorithm. A class distribution is stored for each tree node, and each tree node of the localization tree also includes a set of offsets relative to the object center. During testing, a small number of regions most likely to contain the actor are found by local feature voting using the localization trees, and then the holistic features are extracted from these regions. Finally, the action is recognized by using the holistic feature voting using the recognition tree. Experimental results demonstrate that our approach outperforms the state of art on the Keck gesture dataset, the CMU action dataset and the KTH action dataset.
引文
[1]徐光祐.计算机视觉[M].北京:清华大学出版社, 1999.
    [2]王亮,胡卫明,谭铁牛.人运动的视觉分析综述[J].计算机学报, 2002, 25(3): 225-237.
    [3] Turaga P., Chellappa R., Subrahmanian V. S., et al. Machine recognition of human activities: A survey[J]. IEEE Transactions Circuits and Systems for Video Technology, 2008, 11(8): 1473-1488.
    [4]卢昕.基于视频的人体运动分析[D].浙江:浙江大学, 2003.
    [5]章毓晋.基于内容的视觉信息检索[M].北京:科学出版社, 2003.
    [6] Poppe R. A survey on vision-based human action recognition[J]. Image and Vision Computing, 2010, 28(6): 976-990.
    [7] Laptev I., Marszalek M., Schmid C., et al. Learning realistic human actions from movies[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [8] Ikizler N., Cinbis R., Sclaroff S. Learning actions from the web[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009: 1-8.
    [9] Liu J., Luo J., Shah M. Recognizing realistic actions from videos in the wild[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009: 1-8.
    [10] Duchenne O., Laptev I., Sivic J. Automatic annotation of human actions in video[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009:1-8.
    [11] Alonso P., Reid I. A probabilistic framework for recognizing similar actions using spatio-temporal features[A]. Proceedings of the British Machine Vision Conference[C]. Oxford: Butterworth-Heinemann Ltd, 2007: 1-10.
    [12] Moeslund T. B., Hilton A., Kruger V. A survey of adances in vision-based human motion capture and analysis[J]. Computer Vision and Image Understanding, 2006, 104(2): 90-126.
    [13] Poppe R. Vision-based human motion analysis: An overview[J]. Computer Vision and Image Understanding, 2007, 108(1):4-18.
    [14] Blank M., Gorelick L., Shechtman E., et al. Actions as space-time shapes[A]. Proceedings of IEEE Conference on Computer Vision[C]. Beijing: IEEE, 2005: 1395-1402.
    [15] Ke Y., Sukthankar R., Hebert M. Event detection in crowded videos[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007:1-8.
    [16] Laptev I., Perez P. Retriving actions in movies[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007:1-8.
    [17] Mikolajczyk K., Uemura H. Action recognition with motion appearance vocabulary forest[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008: 1-8.
    [18] Lin Z., Jiang Z., Davis L. S. Recognizing actions by shape-motion prototype trees[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009:1-8.
    [19] Bobick A., Davis J. The recognition of human movement using tempral templates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3):257–267.
    [20] Efros A., Berg A., Mori G., et al. Recognizing action at a distance[A]. Proceedings of IEEE Conference on Computer Vision[C]. Nice: IEEE, 2003: 726-733.
    [21] Fathi A., Mori G. Action recognition by learning mid-level motion features[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008: 1-8.
    [22] Dollar P., Rabaud V., Cottrell G., et al. Behavior recognition via sparse spatio-temporal features[A]. Proceedings of IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance[C]. New Jersey: IEEE CS, 2005: 65-72.
    [23] Schuldt C., Laptev I., Caputo B. Recognizing human actions: A local svm approach[A]. International Conference on Pattern Recognition[C]. Cambridge: IEEE, 2004:32-36.
    [24] Liu J., Shah M. Learning human actions via information maximization[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [25] Yuan J., Liu Z., Wu Y. Discriminative subvolume search for efficient action detection[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009:1-8.
    [26] Niebles J. C., Wang H., Li F. Unsupervised learning of human action categories using spatial-temporal words[J]. International Journal of Computer Vision, 2008, 79(3):299-318.
    [27] Mikolajczyk K., Tuytelaars T., Schmid C., et al. A comparison of affine region detectors[J]. International Journal of Computer Vision, 2005, 65(1): 43-72.
    [28] Laptev I., Lindeberg T. Space-time interest points[A]. Proceedings of IEEE Conference on Computer Vision[C]. Nice: IEEE, 2003: 432-439.
    [29] Rabiner L. R. A tutorial on hidden markov models and selected applications in Speech Recognition[J]. Proceedings of the IEEE, 1989, 77(2):257-286.
    [30] Sullivan J., Carlsson S. Recognizing and tracking human action[A]. Proceedings of European Conference on Computer Vision[C]. Berlin: Springer-Verlag, 2002: 629-644.
    [31] Wang Y., Jiang H., Drew M. S., et al. Unsupervised discovery of action classes[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006:1654-1661.
    [32] Weinland D., Boyer E. Action recognition using examplar-based embedding[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-7.
    [33] Weinland D., Ronfard R., Boyer E. Automatric discovery of action taxonomies from multiple views[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006: 1639-1645.
    [34] Burges C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery, 1998, 2:121–167.
    [35] Viola P., Jones M. Rapid objects detection using a boosted cascade of simple features[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2001:511-518.
    [36] Bishop C. M. Pattern recognition and machine learning[M]. New York: Springer, 2006.
    [37] Shet V. D., Prasad V., Elgammal A., et al. Multi-cue exemplar-based nonparamtric model for gesture recognition[A]. Proceedings of Indian Conference on Computer Vision, Graphics and Image Processing[C]. India: Allied Publishers, 2004: 656-662.
    [38] Peursum P., Venkatesh S. West G. Tracking-as-recognition for articulated full-body human motion analysis[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007:1-8.
    [39] Caillette F., Galata A., Howard T. Real-time 3-D human body tracking using learnt models of behaviors[J]. Computer Vision and Image Understanding, 109(2): 112-125, 2008.
    [40] Natarajan P., Nevatia R. Online, real-time tracking and recognition of human actions[A]. Proceedings of the Workshop on Motion and Video Computing[C]. New Jersey: IEEE CS, 2008:1-8.
    [41] Murphy K. Dynamic Bayesian Networks: Representation, Inference and Learning[D]. Califoria: University of Califoria at Berkeley, 2002.
    [42] Lafferty J., McCallum A., Pereira F. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data[A]. Proceedings of the International Conference on Machine Learning[C]. San Francisco: Morgan Kaufmann, 2001: 282-289.
    [43] Mendoza M., Blanca N. Applying space state models in human action recognition: a comparative study[J]. Lecture Notes in Computer Science, 2008, 5098: 53-62.
    [44] Wang L., Suter D. Recogizing human activities from silhouettes: motion subspace and factorial discriminative graphical model[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007:1-8.
    [45] Quattoni A., Wang S., Morency L., et al. Hidden conditional random fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(10): 1848-1852.
    [46] Morency L., Quattoni A., Darrell T. Latent-dynamic discriminative models for continuous gesture recognition[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007:1-8.
    [47] Natarajan P., Nevatia R. View and scale invariant action recognition using multiview shape-flow model[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [48] Bobick A., Davis J. Real-time recognition of activity using temporal templates[A]. Proceedings of IEEE Workshop on Applications of Computer Vision[C]. Los Alamitos: IEEE, 1996: 39-42.
    [49] Aggarwal J. K., Park S. Human motion: Modeling and Recognition of Actions and Interactions[A]. Proceedings of International Symposium on 3D Data Processing, Visualization, and Transmission[C]. Washington DC: IEEE CS, 2004: 640-647.
    [50] Park S., Aggarwal J. K. Recognition of human interaction using multiple features in grayscale images[A]. Proceedings of International Conference on Pattern Recognition[C]. Washington DC: IEEE CS, 2000: 51-54.
    [51] Dever J., Lobo N., Shah M. Automatic visual recognition of armed robbery[A]. Proceedings of International Conference on Pattern Recognition[C]. Washington DC: IEEE CS, 2002: 451-455.
    [52] Gupta S., Mooney R. Using closed captions to train activity recognizers that improve video retrieval[A]. Proceedings of the Workshop on Visual and Contextual Learning at the Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009: 30-37.
    [53] Veeraraghavan A., Chellappa R., Roy-Chowdhury A. K. The function space of anactivity[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006: 959-968.
    [54] Vitaladevuni S. N., Kellokumpu V., Davis L. S. Action recognition using ballistic dynamics[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [55] Grita A., Sheikh Y., Shah M. On the use of anthropometry in the invariant analysis of human actions[A]. Proceedings of International Conference on Pattern Recognition[C]. Washington DC: IEEE CS, 2004: 923-926.
    [56] Natarajan P., Nevatia R. Coupled Hidden Semi Markov Models for Activity Recognition[A]. Proceedings of the IEEE Workshop on Motion and Video Computing[C]. New Jersey: IEEE CS, 2007:1-10.
    [57] Ryoo M., Aggarwal J. Semantic Representation and Recognition of Continued and Recursive Human Activities[J]. International Journal on Computer Vision, 82(1):1-24, 2009.
    [58] Wren C. R., Azarbayejani A., Darrell T. Pfinder: Real-time tracking of the human body[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7):780-785.
    [59] Steffens J., Elagin E., Neven H. PersonSpotter-fast and robust system for human detection, tracking and recognition[A]. Proceedings of 3rd IEEE International Conference on Automatic Face and Gesture Recognition[C]. Washington DC: IEEE CS, 1998: 516-521.
    [60] Cai Q., Aggarwal J. K. Tracking human motion in structured environments using a distributed camera system[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(11):1241-1247.
    [61] Haritaoglu I., Davis L. S., Harwood D. W4: Who? When? Where? What? A real time system for detecting and tracking people[A]. Proceedings of IEEE Conference on Automatic Face and Gesture Recognition[C]. Washington DC: IEEE CS, 1998: 222-227.
    [62] Alsaqre F. E., Yuan B. Z. Moving shadows detection in video sequences[A]. Proceedings of International Conference on Signal Processing[C]. Beijing: IEEE, 2004: 1306-1309.
    [63] Yang J., Waibel A. A real time face tracker[A]. Proceedings of IEEE Workshop on Applications of Computer Vision[C]. Los Alamitos: IEEE, 1996: 142-147.
    [64] Ayers D., Shah M. Recognizing human actions in a static room[A]. Proceedings ofIEEE Workshop on Applications of Computer Vision[C]. Washington DC: IEEE CS, 1998:42-47.
    [65] Kjeldsen R., Kender J. Finding skin in color images[A]. Proceedings of the Second International Conference on Automatic Face and Gesture Recognition[C]. Washington DC: IEEE CS, 1996: 312-317.
    [66] Buschmann F.面向模式的软件体系结构[M].郭福亮译.北京:机械工业出版社, 2003.
    [67] Ghidary S. S., Nakata Y., Takamari T. Human detection and localization at indoor environment by home robot[A]. Proceedings of IEEE International Conference on Systems, Man, and Cybernetics[C]. Nashville: IEEE SMC, 2000, 2: 1360-1365.
    [68] Rosin P. L., Ellis T. Image difference threshold strategies and shadow detection[A]. Proceedings of British Machine Vision Conference[C]. Oxford: Butterworth-Heinemann Ltd, 1995.
    [69]郑南宁.计算机视觉与模式识别[M].北京:国防工业出版社, 1998.
    [70] Ali S., Basharat A., Shah M. Chaotic invariants for human action recognition[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007: 1-8.
    [71] Sheikh Y., Sheikh M., Shah M. Exploring the space of a human action[A]. Proceedings of IEEE Conference on Computer Vision[C]. Beijing: IEEE, 2005: 144-149.
    [72] Comaniciu D., Meer P. Mean shift: a robust approach toward feature space analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(5): 603-619.
    [73] Parameswaran V., Ramesh V., Zoghlami I. Tunable kernels for tracking[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006: 2179- 2186.
    [74] Collins R. T. Mean-shift blob tracking through scale space[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. Los Alamitos: IEEE, 2003: 234-240.
    [75] Comaniciu D., Ramesh V., Meer P. Kernel-based object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564-577.
    [76] Birchfield S. T., Rangarajan S. Spatiograms versus histograms for region-based tracking[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. San Diego: IEEE CS, 2005: 1158-1163.
    [77]彭宁嵩,杨杰,刘志,等. Mean-shift跟踪算法中核函数窗宽的自动选取[J].软件学报, 2005, 16(9): 1542-1550.
    [78] Comaniciu D., Ramesh V., Meer P. The variable bandwidth mean shift and data-driven scale selection[A]. Proceedings of IEEE Conference on Computer Vision[C]. Los Alamitos: IEEE, 2001: 438-445.
    [79] Comaniciu D., Meer P. Distribution free decomposition of multivariate data[J]. Pattern Analysis and Applications, 1999, 2: 22-30.
    [80]王涛,胡事民,孙家广.基于颜色-空间特征的图像检索[J].软件学报, 2002, 13(10): 2031-2036.
    [81] Scott D. W. Multivariate density estimation[M]. New York: Wiley-Interscience, 1992.
    [82] Wang H. Z., Suter D., Schindler K. Effective appearance model and similarity measure for particle filtering and visual tracking[A]. Proceedings of European Conference on Computer Vision[C]. Berlin: Springer-Verlag, 2006: 606-618.
    [83] Kailath T. The divergence and Bhattacharyya distance measures in signal selection[J]. IEEE Transactions on Communication, 1967, 15(1): 52-60.
    [84] Burt P. J., Adelson E. H. The laplacian pyramid as a compact image code[J]. IEEE Transactions on Communications, 1983, 31(4): 532-540.
    [85] Harpaz R., Haralick R. The EM algorithm as a lower bound optimization technique[R]. New York: City University of New York’s Graduate Center, 2006.
    [86]徐森林,薛春华.数学分析[M].北京:清华大学出版社, 2005.
    [87] Fisher R. Context aware vision using image-based active recognition[DB/OL]. http://homepages.inf.ed.ac.uk/rbf/CAVIARDA-TA1/, 2003-07-11.
    [88] Dalal N., Triggs B. Histograms of oriented gradients for human detection[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. San Diego: IEEE CS, 2005: 886-893.
    [89] Schwartz W., Kembhavi A., Harwood D., et al. Human detection using partial least squares analysis[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009: 1-8.
    [90] Lin Z., Davis L. S. A pose-invariant descriptor for human detection and segmentation[A]. Proceedings of European Conference on Computer Vision[C]. Berlin: Springer-Verlag, 2008: 423-436.
    [91] Xing J., Ai H., Lao S. Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses[A]. Proceedingsof IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009:1-8.
    [92] Li Y., Huang C., Nevatia R. Learning to associate: Hybrid-boosted multi-target tracker for crowded scene[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009:1-8.
    [93] Huang C., Wu B., Nevatia R. Robust object tracking by hierarchical association of detection responses[A]. Proceedings of European Conference on Computer Vision[C]. Berlin: Springer-Verlag, 2008: 788-801.
    [94] Ryoo M., Aggarwal J. Observe-and-explain: A new approach for multiple hypotheses tracking of humans and objects[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [95] Kuhn H. W. The Hungarian method for the assignment problem[J]. Naval Research Logistics Quarterly, 1955, 2: 83-97.
    [96] Li H., Greenspan M. Multi-scale gesture recognition from time-varying contours[A]. Proceedings of IEEE Conference on Computer Vision[C]. Beijing: IEEE, 2005:236-243.
    [97] ShenY., Foroosh H. View-invariant action recognition using fundamental ratios[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-6.
    [98] Bradski G. R., Davis J. W. Motion segmentation and pose recognition with motion history gradients[J]. Machine Vision and Applications, 2002, 13: 174-184.
    [99] Wang Y., Sabzmeydani P., Mori G. Semi-latent dirichlet allocation: A hierarchical model for human action recognition[A]. Proceedings of ICCV Workshop on Human Motion Understanding, Modeling, Capture and Animation[C]. Berlin: Springer, 2007: 240-254.
    [100] Elgammal A., Shet V., Yacoob Y., et al. Learning dynamics for exemplar-based gesture recognition[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. Madison: IEEE CS, 2003: 571-578.
    [101] Thurau C., Hlavac V. Pose primitive based human action recognition in videos or still images[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008: 1-8.
    [102] Shechtman E., Irani M. Space-time behavior-based correlation[J]. IEEE Transactions on Pattern Analysis and Machine Intellegence, 2007, 29(11): 2045-2056.
    [103] Ke Y., Sukthankar R., Hebert M. Spatio-temporal shape and flow correlation for action recognition[A]. Proceedings of IEEE Conference on Computer Vision andPattern Recognition[C]. New Jersey: IEEE CS, 2007: 1-8.
    [104] Nowozin S., Bakir G., Tsuda K. Discriminative subsequence mining for action classification[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007:1-8.
    [105] Rodriguez M. D., Ahmed J., Shah M. Action mach: A spatio-temporal maximum average correlation height filter for action recognition[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [106] Wong S., Cipolla R. Extracting spatiotemporal interest points using global information[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007:1-8.
    [107] Jhuang H., Serre T., Wolf L., et al. A biologically inspired system for action recognition[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007:1-8.
    [108] Liu J., Ali S., Shah M. Recognizing human actions using multiple features[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008: 1-8.
    [109] Schindler K., Gool L. V. Action snippets: How many frames does human action recognition require[A]? Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008: 1-8.
    [110] Niebles J. C., Li F. A hierarchical model of shape and appearance for human action classification[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007: 1-8.
    [111] Ahmad M., Lee S. Human action recognition using shape and clg-motion flow from multi-view image sequences[J]. Pattern Recognition, 2008, 41(7): 2237-2252.
    [112] Cuntoor N. P., Chellappa R. Epitomic representation of human activities[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007:1-8.
    [113] Parameswaran V., Chellappa R. View invariance for human action recognition[J]. International Journal of Computer Vision, 2006, 66(1): 83-101.
    [114] Souvenir R., Babbs J. Learning the viewpoint manifold for action recognition[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [115] Li W., Zhang Z., Liu Z. Expandable data-driven graphical modeling of humanactions based on salient postures[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(11): 1499-1510.
    [116] Lv F., Nevatia R. Single view human action recognition using key pose matching and viterbi path searching[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2007:1-8.
    [117] Fanti C., Zelnik-Manor L., Perona P. Hybrid models for human motion recognition[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. San Diego: IEEE CS, 2005:1166-1173.
    [118] Marszalek M., Laptev I., Schmid C. Actions in context[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009:1-8.
    [119] Shi Q., Wang L., Cheng L., et al. Discriminative human action segmentation and recognition using semi-markov model[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [120] Sminachisescu C., Kanaujia A., Li Z., et al. Conditional models for contextual human motion recognition[A]. Proceedings of IEEE Conference on Computer Vision[C]. Beijing: IEEE, 2005, 2:1808-1815.
    [121] Nister D., Stewenius H. Scalable recognition with a vocabulary tree[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006: 2161-2168.
    [122] Salvador S., Chan P. Fastdtw: Toward accurate dynamic time warping in linear time and space[A]. Proceedings of KDD Workshop on Mining Temporal and Sequential Data[C]. Amsterdam: IOS Press, 2004: 70-80.
    [123] Kim K., Davis L. S. Multi-camera tracking and segmentation of occluded people on ground plane using search-guided particle filtering[A]. Proceedings of European Conference on Computer Vision[C]. Berlin: Springer-Verlag, 2006: 98-106.
    [124] Dondera R., Doermann D., Davis L. S. Action Reognition based on Human Movement Characteristics[A]. IEEE workshop on Motion and Video Computing[C]. New Jersey: IEEE CS, 2009: 1-8.
    [125] US-ARMY. Visual signals[J]. Field Manual, 1987: 21-60.
    [126] Wang L., Suter D. Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007: 1-8.
    [127] Kim K., Chalidabhongse T. H., Harwood D., et al. Real-time foreground-backgroundsegmentation using codebook model[J]. Real-time Imaging, 2005, 11(3): 167-256.
    [128] Lampert C., Blaschko M., Hofmann T. Beyond sliding windows: Object localization by efficient subwindow search[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008: 1-8.
    [129] Ke Y., Sukthankar R., Hebert M. Efficient visual event detection using volumetric features[A]. Proceedings of IEEE Conference on Computer Vision[C]. Beijing: IEEE, 2005:1-8.
    [130] Yao B., Zhu S. Learning deformable action templates from cluttered videos[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009: 1-8.
    [131] Torralba A., Murphy K. P., Freeman W. T. Sharing visual features for multiclass and multiview object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(5):854–869.
    [132] Ballard D. H. Generalizing the Hough transform to detect arbitrary shapes[J]. Pattern Recognition, 13(2):111-122, 1981.
    [133] Yang L., Jin R., Sukthankar R., et al. Unifying discriminative visual codebook generation with classifier training for object category recognition[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2008:1-8.
    [134] Larlus D., Jurie F. Latent mixture vocabularies for object categorization and segmentation[J]. Image and Vision Computing, 2009, 27(5): 523–534.
    [135] Moosmann F., Triggs B., Jurie F. Fast discriminative visual codebooks using randomized clustering forests[A]. Proceedings of Advances in Neural Information Processing Systems[C]. Cambridge: MIT Press, 2006: 985-992.
    [136] Philbin J., Chum O., Isard M., et al. Object retrieval with large vocabularies and fast spatial matching[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2007: 1-8.
    [137] Perronnin F. Universal and adapted vocabularies for generic visual categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7):1243–1256.
    [138] Xing E. P., Ng A. Y., Jordan M. I., et al. Distance metric learning with application to clustering with side-information[A]. Proceedings of Advances in Neural Information Processing Systems [C]. Cambridge: MIT Press, 2002: 505-512.
    [139] Bilenko M., Basu S., Mooney R. J. Integrating constraints and metric learning in semi-supervised clustering[A]. Proceeding of International Conference on MachineLearning[C]. New York: ACM, 2004: 81-88.
    [140] Winn J., Criminisi A., Minka T. Object categorization by learned universal visual dictionary[A]. Proceedings of IEEE Conference on Computer Vision[C]. Beijing: IEEE, 2005: 1800-1807.
    [141] Wang L., Zhou L., Shen C. A fast algorithm for creating a compact and discriminative visual codebook[A]. Proceedings of European Conference on Computer Vision[C]. Berlin: Springer-Verlag, 2008: 719-732.
    [142] Lazebnik S., Raginsky M. Supervised learning of quantizer codebooks by information loss minimization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(7): 1294–1309.
    [143] Still L. Geometric clustering using the information bottleneck method[A]. Proceedings of Advances in Neural Information Processing Systems[C]. Cambridge: MIT Press, 2003: 27-37.
    [144] Savarese S., Winn J., Criminisi A. Discriminative object class models of appearance and shape by correlaton[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006: 2033-2040.
    [145] Lazebnik S., Schmid C., Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New York: IEEE CS, 2006: 2169-2178.
    [146] Leibe B., Leonardis A., Schiele B. Robust object detection with interleaved categorization and segmentation[J]. International Journal of Computer Vision, 2008, 77(1-3): 259-289.
    [147] Gall J., Lempitsky V. Class-specific Hough forests for object detection[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009: 1-8.
    [148] Maji S., Malik J. Object detection using a max-margin hough transform[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009:1-8.
    [149] Gilbert A., Illingworth J., Bowden R. Fast realistic multi-action recognition using mined dense spatio-temporal features[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009: 925-931.
    [150] Reddy K., Liu J., Shah M. Incremetal action recognition using feature-tree[A]. Proceedings of IEEE Conference on Computer Vision[C]. New Jersey: IEEE, 2009:1010-1017.
    [151] Jain P., Kapoor A. Active learning for large multi-class problems[A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C]. New Jersey: IEEE CS, 2009:1-8

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700