实时视频监控系统中的运动人体检测算法研究

英文题名：Study on Moving Human Body Detection in Real Time Video Surveillance System
作者：栾大勇
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：视频监控 ; 运动分割 ; 人体检测
英文关键词：video surveillance ; motion detection ; human body detection
学位年度：2009
导师：李文辉
学科代码：081203
学位授予单位：吉林大学
论文提交日期：2009-04-01

摘要

本文对视频监控中的运动人体检测相关技术进行了深入的研究,并实现了一个基于视频的运动人体检测系统。
     智能视频监控(IVS)是计算机视觉领域的热门研究方向,根据用户的自定义规则,IVS系统能够自动的检测到潜在的危险或者采集商务信息。其中人体检测是一个主要的研究内容,其不仅应用在智能视频监控中,在智能人机接口和虚拟现实等领域也有广泛的应用。
     本文把运动人体检测分解为两个大的问题,首先进行运动检测,分割出视频中运动的部分,然后对提取的运动区域进行分类,判断其是人体还是非人体。本文中研究的运动检测是指摄像机静止的情况,最常用的方法是背景减除法(Background Subtraction),由于系统对于检测速度的要求,本文采用了三帧差分方法结合Blob融合方法提取视频中的运动物体。得到了运动的区域,就要对这个区域进行分类,本文采用了基于机器学习的方法。从INRIA数据库中选取一定数目的人体样本和非人体样本,提取其Haar-like以及HOG特征,然后使用Adaboost算法训练得到人体分类器。这样就可以借助这个分类器对未知的待检测窗口进行分类,判断其是否为人体。
     经过测试,该系统取得了很好的的检测结果,而且速度达到了30f/s。在前人的工作基础上,本文做出了一些改进,针对三帧差分方法的缺点,提出了BLOB融合技术,对HAAR-LIKE特征进行了扩展,对待检测窗口进行多尺度的检测和对HOG特征进行降维处理,避免了使用svm训练占用大量时间。
We have done some profound study on moving human body detection in Video Surveillance in this thesis, and finished a moving human body detection system based on video.
     There are many kinds of study including camera calibration, motion segmentation and tracking, object classification, multiple camera combination, high level semantic understanding in Video Surveillance, it is becoming one of the most active research direction in computer vision. It can be applied to many practical applications, and possess potential great economic value. It has also intrigued great interest from many research institutes and researchers. Among them, human detection is the main research direction, it is not only applied to intellectual video surveillance, but also intellectual man-machine interface and virtual reality, etc.
     The problem of moving human detection are divided into two large problems in this thesis, first of all, we do the motion detection to segment the motion section in the video, then to classify extracted motion region. The study of the motion detection in the thesis is on condition that the camera is static, so this kind of motion detection is more simple than the circumstance that the camera is moving. To the problem of human recognition, the method of machine learning has been used. There are a lot of difficulties on both the section of motion detection and the section of human detection to be resolved.
     In the case of static camera, the most direct method to detect the moving object in the video in real time is frame differencing, the most frequently-used method is background subtraction. The so-called frame differencing is to get the difference between two consecutive frame, then to threshold the difference, thus we can get the motion region which consist of white pixels. To solve the problem caused by the situation that the object move slowly, double frame differencing method can be used. It means that to get three consecutive frames, and then do double differencing. Background subtraction is a frequently-used method for motion segmentation, this algorithm is to estimate an background model which doen’t include moving object, the motion region is located through computing the difference between the current frame and the background model, the background model is dynamically updated by the detection result. The main difference among different kinds of background subtraction algorithm is that they adopt different kinds of background model and updating algorithm. The most common background subtraction algorithm is to describe the probability distribution of gray value of each pixel using a statistical model, in practice, gaussian distribution is most frequently used. We use different updating coefficient for different detection results when updating background model, in order to decide to whether to change the preceding distribution. Background subtraction is extensively used in many different kinds of applications, not only in video surveillance but also in virtual reality, teleconference and three-dimensional modeling.
     In order to get a high speed, three frame differencing algorithm combined with blob merging was adopted. Three frame differencing is the extension of frame differencing. The implementation of it is quite simple, the three-frame differencing rule suggests that a pixel is legitimately moving if its intensity value has changed significantly between both the current image and the last frame, and the current image and the next-to-last frame. During the detection procedure, the threshold need to be updated in real time so as to adapt the change of background. Because of the fact that frame differencing is generally not an effective method for extracting the entire shape of a moving object. It means that pixels interior to an object with uniform intensity aren’t included in the set of“moving”pixels. To overcome this problem, we adopt an algorithm that can get all the pixels interior to the object and merge the blobs into one entirety. After getting the foreground image, we need to extract the connected domain from this binary image, thus we can get the size and location of the moving object in the image. To extract the connected region, twice scanning label algorithm.
     After getting the motion region, the next task is to classify this region, in order to judge whether it is belong to human or not. This is a typical pattern recognition problem, we will adopt the method based on machine learning to classify the unknown region. The basic rule of human recognition based on machine learning is that to extract the feature of the object, then selecting one of the applicable machine learning algorithm to train a classifier according to this feature. Finally, we use this gotten classifier to classify the object. The feature that we select would be shape descriptor(like contour), color descriptor(like the color of skin),or the combination of some other features.
     As to the selection of the feature, we adopt the combination of haar-like and HOG.. The haar-like feature was also called rectangle feature, it is the difference between the sum of pixels interior in different rectangles, we adopted five kind rectangle feature in this thesis. Histograms of Oriented Gradient(HOG) based on contour and gradient is a histogram composing of the projection of direction gradient of all the pixels in the rectangle. Each detection window consists of some overlapping blocks, each block composes of several cells, For each cell ,all the gradient are projected to several directions, and this can form a histogram. Finally, the histograms of each cell are connected to one large feature vector, then the feature vector has to be normalized, thus each block has a corresponding HOG feature. As a result of the large number of the features, we introduced the integral image. The integral image can be computed in one pass over the original image, the sum of the pixels within the rectangle can be computed through one addition and two times subtraction, so the computation speed is increased dramatically. As extracting the HOG feature, every HOG feature is a 36-dimension vector, as every block is composed of 4 cells. Thus we used the fisher linear discrimination to reduce its dimension.
     As to training the human classifier, adaboost algorithm was adopted. First of all, the weak classifier was trained, then the strong classifier was trained. The training procedure is actually a feature selection procedure. Firstly, every sample is initialized a weight, then all the samples are classified by every weak classifier, thus the error rate for every weak classifier can be computed, this error rate equals to sum of the weight of the sample which is not classified correctly. Consequently, the weak classifier that has smaller error rate is the better ones. For each round of feature selection, the weak classifier which has the smallest error rate was selected. Before the next selection round, every weight of sample need to be updated so as to lower the weight of the sample which was classified correctly, conversely, the weight that was not classified correctly was increased. It means that the samples that are hard to be classified are paid more attention in the next round.
     Through testing, this system has got a good detection result with the detection speed 30f/s.On the basis of the work of other researchers, some new ideas have been proposed in this thesis, in face of the drawback of frame differencing, we proposed blob merging algorithm, we also extended the feature set of haar-like and detected the window through multiple scale method. Finally, we induced the dimension of the HOG feature avoiding using svm.

引文

[1]马颂德,张正友.计算机视觉[M].北京科学出版社, 1998.
    [2]张广军.机器视觉[M].北京科学出版社, 2005.
    [3]吴怡君.基于视频序列的运动人体检测算法研究[D].浙江大学. 2007.
    [4] Collins R., et al. A system for video surveillance and monitoring: VSAM final report[R]. Carnegie Mellon University: Technical Report CMU, 2000.
    [5] Remagnino P., Tan T., Baker K. Multi-agent visual surveillance of dynamic scenes[C]. Image and Vision Computing, 16 (8) : 529- 532, 1998.
    [6] Collision P.A. The Application of Camera Based Traffic Monitoring Systems [C]. IEEE Seminar on CCTV and Road Surveillance, 8/1- 8/6, 1999.
    [7] Remagnino P., Tan T., Baker K. Agent orientated annotation in model based visual surveillance[C]. In: Proc IEEE International Conference on Computer Vision, Bombay, India, 857- 862,1998.
    [8]孔晓东.智能视频监控技术研究[D].上海交通大学. 2008.
    [9] Ivan Laptev. Improvements of Object Detection Using Boosted Histograms[C]. In Proc.BMVC'06, Edinburgh, UK, pp.III:949-958.
    [10] Matsuyama T. Cooperative Distributed Vision[C]. In Proceedings of DARPA Image Understanding Workshop, 365-384, 1998.
    [11]朱文佳.基于机器学习的行人检测关键技术研究[D].上海交通大学. 2008.
    [12] Jain, R., Nagel, H. On the analysis of accumulative difference pictures from image sequences of real world scenes[J]. PAMI 1(2), 206-214 1979.
    [13] Kameda. A human motion estimation method using 3-successive video frames[C]. International Conference on Virtual Systems and Multimedia, pp. 135–140, 1996.
    [14] Migliore, D., Matteucci, M., Naccari, M. A revaluation of frame difference in fast and robust motion detection[C]. Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks, pp. 215–218, 2006.
    [15] Elgammal, A., Harwood, D., Davis, L. Non-parametric model for background subtraction[C]. In IEEE FRAME-RATE Workshop, Springer 1999.
    [16] Piccardi, M. Background subtraction techniques: a review[C]. In IEEE International Conference on Systems, Man and Cybernetics, Volume 4, 2004.
    [17] Cheung, G.K.M., Kanade, T., Bouguet, J.Y., Holler, M. Real time system for robust 3D voxel reconstruction of human motions[C]. In CVPR, 714–720, 2000.
    [18] C.R.Wren,A. Azarbayejani,T. Darrell and A.P. Pentland. Pfinder: Real-Time tracking of the human body[J]. In PAMI,VOL19, NO.7, JULY, 1997.
    [19] N. Friedman and S. Russell. Image segmentation in video sequences: A probabilistic approach[C]. In Thirteenth Conference on Uncertainty in Artificial Intelligence(UAI),Aug 1997.
    [20] R. Vidal and S. Sastry. Optimal segmentation of dynamic scenes from two perspective views[C]. In CVPR, Vol. 2,pp, 281–286, 2003.
    [21] M. Harville. A framework for high-level feedback to adaptive, perpixel,mixture-of-gaussian background models[C]. In ECCV, page III:543 ff., Copenhagen, Denmark, May 2002.
    [22] Dar-Shyang Lee Hull, J.J. Erol, B. A Bayesian framework for gaussian mixture background modeling[C]. In ICIP, 2003.
    [23] Tian, Y.L., Lu, M., Hampapur, A. Robust and efficient foreground analysis for real-time video surveillance[C]. In CVPR, pp., 1182–1187, 2005.
    [24] A. Elgammal, D. Harwood and L. S. Davis. Non-parametric model for background subtraction[C]. In ECCV, June/July 2000.
    [25] Mittal, A., Paragios, N. Motion-based background subtraction using adaptive kernel density estimation[C]. In CVPR, 2004.
    [26] Zoran Zivkovic, Ferdinand van der Heijden. Efficient adaptive density estimation per image pixel for the task of background subtraction[J]. Pattern Recognition letters 2006 773-780.
    [27] Sheikh, Y., Shah, M. Bayesian modeling of dynamic scenes for object detection[J]. PAMI 27(11), 1778–1792 2005.
    [28] Andrea Prati, Ivana Mikic, Mohan M., Trivedi and Rita Cucchiara. Detecting moving shadows: Algorithms and evaluation[J]. PAMI 2003.
    [29] M. Heikkila, M. Pietikainen and J. Heikkil A. A Texture-based Method for Detecting Moving Objects[J]. PAMI 2006.
    [30] Teresa Ko, Stefano Soatto and Deborah Estrin. Background Subtraction on Distributions[C]. ECCV 2008.
    [31] Julien Pilet , Christoph Strecha and Pascal Fua. Making Background Subtraction Robust to Sudden Illumination Changes[C]. ECCV 2008.
    [32] Volkan Cevher, Aswin Sankaranarayanan, Marco F. Duarte, Dikpal Reddy, Richard G. Baraniuk and Rama Chellappa. Compressive Sensing for Background Subtraction[C]. ECCV 2008.
    [33] Yannick Benezeth, Pierre-Marc Jodoin, Bruno Emile, Hélène Laurent, Christophe Rosenberger. Review and Evaluation of Commonly-Implemented Background Subtraction Algorithms[C]. ICPR 2008.
    [34] Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection[C]. CVPR 2005.
    [35] C.Stauffer, W.E.L. Grimson. adaptive background mixture models for real-time tracking[C]. CVPR1999,pp., 246-252.
    [36] Arthur Dempster, Nan Laird, and Donald Rubin. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society, Series B,39(1):1–38, 1977.
    [37] Barron J, Fleet D, Beauchemin S. Performance of optical flow techniques[J]. International Journal of Computer Vision, 12(1) : 42-77, 1994.
    [38] Bo Wu, Ram Nevatia. Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors[C]. ICCV 2005.
    [39] Q.Zhu, S. Avidan, M.-C.Yeh, and K.-T.Cheng, Fast Human Detection Using a Cascade of Histogram of Oriented Gradients[C]. Proc.TEEE International Conference on Computer Vision and Pattern Recognition, 2006.
    [40] D.G..Lowe, Distinctive image features from scale-invariant key points[J]. IJCV, 2004.
    [41] C.Papageorgiou and T.Poggio, A trainable system for object detection[J]. IJCV, 2000.
    [42] S. Belongie, J.Malik and J.Puzicha, Matching shapes[C]. The 8th ICCV, Vancouver,Canada,pages,2001.
    [43] Paul Viola and Michael J. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features[C]. IEEE CVPR, 2001.
    [44] P. Viola, M. J. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance[C]. The 9th ICCV, Nice,France, volume 1, pages 734.741, 2003.
    [45] R. Lienhart and J. Maydt. An Extended Set of Haar-like Features for Rapid Object Detection[C]. IEEE ICIP, vol. 1, pp. 900-903, 2002.
    [46] P. Sabzmeydani and G. Mori. Detecting pedestrians by learning shapelet features[C]. In Proc. CVPR, pages 1-8, 2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700