Using a Product Manifold distance for unsupervised action recognition

详细信息	查看全文 \| 推荐本文 \|

作者：Stephen O'Hara ; ^{svohara@cs.colostate.edu} ; Yui Man Lui ^{lui@cs.colostate.edu} ; Bruce A. Draper ^{draper@cs.colostate.edu}
关键词：Unsupervised learning ; Product Manifold ; Action recognition
刊名：Image and Vision Computing
出版年：2012
期刊代码：68_02628856
类别：et
出版时间：March, 2012
卷：30
期：3
页码：206-216
文件大小：1631 K

摘要

This paper presents a method for unsupervised learning and recognition of human actions in video. Lacking any supervision, there is nothing except the inherent biases of a given representation to guide grouping of video clips along semantically meaningful partitions. Thus, in the first part of this paper, we compare two contemporary methods, Bag of Features (BOF) and Product Manifolds (PM), for clustering video clips of human facial expressions, hand gestures, and full-body actions, with the goal of better understanding how well these very different approaches to behavior recognition produce semantically relevant clustering of data.

We show that PM yields superior results when measuring the alignment between the generated clusters and the nominal class labeling of the data set. We found that while gross motions were easily clustered by both methods, the lack of preservation of structural information inherent to the BOF representation leads to limitations that are not easily overcome without supervised training. This was evidenced by the poor separation of shape labels in the hand gestures data by BOF, and the overall poor performance on full-body actions.

In the second part of this paper, we present an unsupervised mechanism for learning micro-actions in continuous video streams using the PM representation. Unlike other works, our method requires no prior knowledge of an expected number of labels/classes, requires no silhouette extraction, is tolerant to minor tracking errors and jitter, and can operate at near real-time speed. We show how to construct a set of training 鈥渢racklets,鈥?how to cluster them using the Product Manifold distance measure, and how to perform detection using exemplars learned from the clusters. Further, we show that the system is amenable to incremental learning as anomalous activities are detected in the video stream. We demonstrate performance using the publicly-available ETHZ Livingroom data set.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700