基于动作识别的智能视频监控

英文题名：Intelligent Visual Surveillance Based on Human Action Recognition
作者：张旭
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：动作识别 ; 智能视频监控 ; 主题模型 ; 词袋模型
英文关键词：Human Action Recognition ; Intelligent Visual Surveillance ; Theme
英文关键词：Model ; Bag of Words Model
学位年度：2012
导师：万丽莉
学科代码：081002
学位授予单位：北京交通大学
论文提交日期：2012-06-13
答辩委员会主席：梁满贵

摘要

摘要：随着计算机硬件水平以及计算机视觉学科的发展,智能视频监控越来越受到研究机构和学者的关注,一些智能视频监控的产品也逐渐出现。当前的智能视频监控主要是基于目标检测与跟踪的解决方案,其智能性还不够高,本文针对人们对智能视频监控不断增长的需求,提出了一种基于动作识别的智能视频监控解决方案。
     本文分为人体检测、人体跟踪以及动作识别三个部分进行详细论述。由于人体检测与跟踪的成熟性,本文的重点放在了动作识别相关的研究上。在实现基本人体检测与跟踪的算法基础上,本文实现了基于投影直方图的动作识别与基于词袋模型和主题模型的动作识别,并对后者进行了改进,提出了一种基于光流描述子词袋的动作识别方法。由于前人在构建词库时,对各类动作进行单词频率统计时,总是按照全局来统计的,这样可能会导致单词的误匹配,为了解决这个问题,我们利用了人体模型的特点,认为动作在时间轴上独立,人体各个部分做的动作在空间上是相关的。我们以光流局部时空最大值作为特征,将人体区域分块构建光流描述子词库,强制将人体不同部位提取到的相似特征分配为不同的单词,从而在词库中加入空间信息。最后,我们改进了原有的pLSA模型,并使用改进的pLSA模型进行动作识别。实验证明,该算法在Weizmann视频数据库与KTH视频数据库上有着良好的性能。
     为了验证算法在实际环境的有效性,本文搭建了一个智能视频监控平台,平台具有基本的检测、跟踪算法,基于光流描述子词袋的动作识别算法和基于投影直方图的动作识别算法等,并可以通过短信发送模块将监控信息及时发送到指定手机,记录检测、跟踪以及识别的结果。和普通智能监控平台相比,我们的监控平台更加智能有效。
The intelligent visual surveillance has attracted significant interest in recent years. Some intelligent visual surveillance product has appeared in public areas as well. However most of the products are based on motion object detecting and tracking. They are not intelligent enough. In this paper, we propose an intelligent visual surveillance solutions based on the human action recognition.
     There are three important approaches, human detection, tracking and action classifying, in human action recognition. Recently, the approaches of human detection and tracking are efficient enough. So we focus on the approach of human action recognition classifying. We complete the basic algorithm of the human detection and tracking and present an improved approach to classify human action based on the BOW model and the pLSA (probabilistic Latent Semantic Analysis) model. We propose an improved feature, which is called the local spatial-temporal maximum value of optical flow to build our bag of words. This feature is able to reduce the high dimension of the pure optical flow template and also has abundant motion information. Our approach of recognition is tested on two datasets, the KTH datasets and WEIZMANN datasets. The result shows its good performance.
     In order to test our method in reality, we develop an intelligent visual surveillance platform. This platform has basic function of human detection and tracking. The human action classifying algorithm based on our improved approach and a method based on the projection histogram are also in the platform. The surveillance information can be sent to the mobile phone automatically. Compared to the common intelligent visual surveillance platform, our intelligent visual surveillance platform is more intelligent and effective.

引文

[1]章毓晋.计算机视觉教程.北京：人民邮电出版社.2011.
    [2]Shapiro L, Stockman G. Computer Vision, Prentice Hall.2001.
    [3]S. F. Chang, "The Holy Grail of content-based media analysis," IEEE Multimedia Mag., 9(2):6-10, 2002.
    [4]Maggion.C and Kammerer.B. Gesture Computer: history, design, and applications. Computer Vision for Human-Machine Interaction, Cambridge Univ. Press, 1998.
    [5]Website:http://mail.google.com/mail/help/motion.html
    [6]Website:http://www.xbox.com/en-US/kinect
    [7]R.Collins, A.Lipton, T. Kanade, et-al. A System for Video Surveillance and Monitoring, VSAM Final Report, CMU-RI-TR-00-12[R].[S.l.], Carnegie Mellon University, 2000.
    [8]Naylor M, Attwood C, I.Annotated digital video for intelligent surveillance and optimized retrieval:final report[R].ADVISOR connortium,2003.
    [9]Haritaoglu, I.Harwood, D.Davis LS.W4:real-time surveillance of people and their activities [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000.
    [10]Remagnino.P, Tan.T and Baker.K. Multi-agent visual surveillance of dynamic scenes. Image and Vision Computing, 16 (8): 529-532, 1998.
    [11]C. Schuldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the International Conference on Pattern Recognition, vol. 3:32-36,2004.
    [12]M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, Proceedings of the International Conference On Computer Vision, vol.2:1395-1402,2005.
    [13]Daniel Weinland, Remi Ronfard, Edmond Boyer, Free viewpoint action recognition using motion history volumes, Computer Vision and Image Understanding (CVIU) 104 (2-3):249-257,2006.
    [14]Mikel D.Rodriguez, Javed Ahmed, Mubarak Shah, Action MACH:a spatio-temporal maximum average correlation height filter for action recognition, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, AK, pp. 1-8, 2008.
    [15]Ivan Laptev, Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld, Learning realistic human actions from movies, in: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, AK, pp. 1-8, 2008.
    [16]Ping Guo, Zhenjiang Miao, Yuan Shen and Heng-Da Cheng. "Real Time Human Action Recognition in a Long Video Sequence", Proc. Advanced Video and Signal Based Surveillance (AVSS), pp.248-255, 2010.
    [17]R. Jain, H. NAGEL. On the Analysis of Accumulative Difference Pictures from Image Sequences of Real World Scenes. IEEE Trans. Pattern Analysis and Machine Intelligence, 1(2): 206-214,1979.
    [18]C. Stauffer, W. Grimson, Adaptive background mixture models for real-time tracking. Proc IEEE Conference on Computer Vision and Pattern Recognition, vol.2:246-252,1999
    [19]张广坤,宋进,李培龙,李玲,基于多帧差分与自适应技术的运动目标检测方法研究, 机电一体化,7(35)：27-30,2010.
    [20]T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Analysis and Machine Intelligence. 6(24):971-987,2002.
    [21]N. Dalal, B. Triggs, Histograms of Oriented Gradients for Human Detection, International Conference on Computer Vision and Pattern Recognition, pp. 886-893,2005.
    [22]B.Wu, R. Nevatia, Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors. International Journal of Computer Vision. 2(75): 247-266,2007.
    [23]董颖,基于光流场的视频运动检测,硕士学位论文,山东大学,2008.
    [24]Stauffer, C. Adaptive background mixture models for real-time tracking, Proceedings of the Conference on Computer Vision and Pattern Recognition, vol.2, 1999.
    [25]R.E.KALMAN. A new approach to linear filtering and prediction problems.Transactions of ASME-Journal of Basic Engineering,82(Series D):25-45,1960.
    [26]K. Fukunaga and L. D. Hostetler, "The estimation of the gradient of a density function, with applications in pattern recognition," IEEE Trans. Information Theory, vol.21:32-40, 1975.
    [27]Comaniciu.D. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603-619, 2002.
    [28]Yezzi. A. Jr. A geometric snake model for segmentation of medical imagery. IEEE Transactions on Medical Imaging, 16(2):199-209.
    [29]Turaga, P. Machine Recognition of Human Activities:A Survey. IEEE Transactions on Circuits and Systems for Video Technology, 18(11):1473-1488.
    [30]凌志刚,赵春晖,梁彦,潘泉,王燕,基于视觉的人行为理解综述,计算机应用研究,25(9)：2570-2578,2008.
    [31]J. Davis and A. Bobick, The Representation and Recognition of Action Using Temporal Templates, IEEE Conference on Computer Vision and Pattern Recognition, pp.928-934, 1997.
    [32]Y. Guo, G. Xu, S. Tsuji, Understanding human motion patterns, in:International Conference on Pattern Recognition, vol.2, pp. 325-329, 1994.
    [33]M. Brand, N. Oliver, A. Pentland, Coupled hidden markov models for complex action recognition, in: Conference on Computer Vision and Pattern Recognition, pp. 994-999, 1997
    [34]D. Ramanan, D.A. Forsyth, Automatic annotation of everyday movements, Technical Report UCB/CSD-03-1262, EECS Department, University of California, Berkeley, 2003.
    [35]J. Yamato, J. Ohya, K. Ishii, Recognizing human action in time-sequential images using hidden markov model, in:Conference on Computer Vision and Pattern Recognition, pp. 379-385, 1992.
    [36]L. Wang, D. Suter, Recognizing human activities from silhouettes:motion subspace and factorial discriminative graphical model, in:Conference on Computer Vision and Pattern Recognition, 2007.
    [37]Ikizler. N.; Cinbis, R.G.; Duygulu, P., Human action recognition with line and flow histograms.. 19th International Conference on Pattern Recognition, pp.1-4,2008
    [38]D. G. Lowe, Distinctive Image Features From scale Invariant Key points, International Journal of Computer Vision,60 (2):91-110, 2004.
    [39]H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, SURF:Speeded Up Robust Features, Computer Vision and Image Understanding, 110(3):346-359, 2008.
    [40]Horn, B. and Schunck, B. Determining optical flow. Artificial Intelligence, vol.17:185-203, 1981.
    [41]Lucas, B. and Kanade, T. An iterative image registration technique with an application to stereo vision. In Proc. Seventh International Joint Conference on Artificial Intelligence, Vancouver, Canada, pp. 674-679,1981.
    [42]R. Polana, R. Nelson, Low level recognition of human motion (or how to get your man without finding his body parts), in: NAM, 1994.
    [43]A.A. Efros, A. Berg, G. Mori, J. Malik, Recognizing action at a distance, in:International Conference on Computer Vision, pp. 726-733,2003.
    [44]I. Laptev and T. Lindeberg, Space-time interest points, IEEE International Conference on Computer Vision, pp. 432-439, 2003.
    [45]P. Dollar, V. Rabaud, G. Cipolla, and S. Belongie, Behavior recognition via sparse apatiotemporal features. IEEE International Workshop on visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.65-72, 2005.
    [46]J.C. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words," International Journal of Computer Vision, 79(3):299-318, 2008.
    [47]Liang Wang, David Suter, Learning and matching of dynamic shape manifolds for human action recognition, IEEE Transactions On Image Processing (TIP) 16(6):1646-1661,2007.
    [48]Pavan K. Turaga, Ashok Veeraraghavan, Rama Chellappa, Statistical analysis on stiefel and grassmann manifolds with applications in computer vision, in: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, AK, pp. 1-8, 2008.
    [49]Lamberto Ballan, Marco Bertini, Alberto Del Bimbo, Lorenzo Seidenari and Giuseppe Serra. "Recognizing Human Actions by Fusing Spatio-Temporal Appearance and Motion Descriptors", Proc. International Conference on Image Processing (ICIP), pp.3569-3572, 2009.
    [50]Mohiuddin Ahmad, Seong-Whan Lee, Human action recognition using shape and CLG-motion flow from multi-view image sequences, Pattern Recognition 41 (7):2237-2252,2008.
    [51]N.Ikizler, David A. Forsyth, Searching for complex human activities with no visual examples, International Journal of Computer Vision (IJCV) 30 (3):337-357,2008.
    [52]K. P. MURPHY, An introduction to graphical models, 2001.
    [53]S.Deerwester, S.T. Dumais and G.W.Furnas, Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990.
    [54]Thomas Hofmann, Learning the Similarity of Documents:an information-geometric approach to document retrieval and categorization, Advances in Neural Information Processing Systems 12, pp-914-920, MIT Press, 2000.
    [55]D. Blei, A. Ng, and M. Jordan. "Latent Dirichlet Allocation". Journal of Machine Learning Research, vol.3:993-1022, 2003.
    [56]L. Fei-Fei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition (CVPR), vol.2: 524-531,2005.
    [57]Yang Wang and Greg Mori. "Human Action Recognition by Semilatent Topic Models", IEEE Transactions On Pattern Analysis and Machine Intelligence, 31(10):1762-1774, 2009.
    [58]R. Fergus, L. Fei-Fei, P. Perona and A. Zisserman. "Learning Object Categories from Google's Image Search", Proc. International Conference on Computer Vision (ICCV), pp.1816-1823, 2005.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700