视频图像中人体目标的检测方法研究

英文题名：Algorithm Research for Human Body Detection in Video Image
作者：王传旭
论文级别：博士
学科专业名称：地图学与地理信息系统
中文关键词：智能视频监控 ; 人体目标检测 ; 高斯混合模型 ; 贝叶斯分类 ; 维纳一步预测
英文关键词：Intelligent surveillance ; Human body detection ; Mixture of Gaussians ; Bayes classification ; Wiener one step prediction
学位年度：2007
导师：刘智深
学科代码：070503
学位授予单位：中国海洋大学
论文提交日期：2007-04-05

摘要

本文主要研究了基于计算机视觉技术的视频智能监控系统中的关键技术,该系统一般由人体目标的检测、行为理解和高层语义输出三个层次,其中人体目标的检测是后面两个模块的基础,也是本文的选题所在。本文主要在以下三个方面进行了探索和研究,提出了新的改进方法,并通过实验证明了新方法的有效性。主要研究工作如下:
     (1)视频滤波方面:根据视频图像中脉冲噪声的特点,利用脉冲耦合神经网络(PCNN)的简化模型作为分类器,将图像中的像素分为未被噪声污染的点和污染点两类,对污染点像素再用中值滤波器进行去噪。仿真试验比较,表明本文算法比传统中值滤波算法在能滤除噪声的条件下,还可以更好地保护图像的边界纹理细节。
     (2)作为人体目检测方法,无论在人体与背景有无相对运动的情况下都应有效。对背景建模的分割方法只能检测到背景中有运动变化的人体目标,在该情况下本文提出了两种改进算法。
     ①针对背景建模中的常见问题,根据混合高斯模型对背景进行建模,可较好适应缓慢变化(光照缓慢变化、摇摆的树叶等)的背景,并在混合高斯模型下对背景区域进行分割,达到检测人体目标的目的。但在光照突变的条件下,该方法检测效果差,本文在此基础上进行了改进,首先提出了一种检测光照突变的简单方法,对光照突变的帧结合像素的空间纹理稳定的特征,进行二次分割。试验结果表明该改进算法提高了传统混合高斯模型分割方法的鲁棒性。
     ②很多算法在检测前景人体目标时,分割出的图像内有较大的空洞,目标图像很不完整,无法用传统的形态滤波方法进行修复。本文提出了一种基于前景人体目标帧内区域邻域相关性和背景区域帧间连续性的分割方法。首先以文献[11]的方法为例进行初分割,文中分析了该方法的原理和缺点,以及检测人体目标出现空洞的原因。作为改进算法,本文利用相关系数计算前景区域某一像素帧内空间邻域像素间的相似度,同时计算其帧间像素间的相关性来进行二次分割,可较好地修复初分割的前景人体区域,得到较为完整的人体图像。
     (3)前两种方法只能检测到背景中有运动变化的人体目标,有一定的局限性。本文进行了基于人体肤色特征的人体目标检测方法研究。对文献[71]提供的方法进行了分析和新的参数预测方法尝试,该文献试验分析皮肤颜色区域在HSV空间随光照帧间变化是一种布朗运动,即是一种稳态随机过程,因而用维纳一步预测方法进行3-D仿射变换参数的预测。试验结果表明在光照基本恒定时,该方法有很好的灵敏性和检测效果;在一定光照变化范围内,能较好地预测当前帧皮肤区域分布,并能有效检测出人体皮肤区域,对光照变化有一定的鲁棒性。
The key techniques of intelligent surveillance based on computer vision are researched in this paper. Generally this system consists 3 parts that are human body detection and behavior understanding and high leveled judgment output, among which human body detection is crucial and basis for the latter two modules, that is the point why this paper focuses on human body detection in image sequence. The whole work includes three aspects and there are some innovations, which have been proved valid through emulation experiments. They are introduced as following.
     (1) video image filter design
     According to the characteristics of pulse noise in video image, a neural network PCNN (Pulse Coupled Neural Network) is applied to aggregate the polluted pixels contaminated by pulse noise and the non-polluted ones, then median filter is used to smooth the contaminated pixels. Tests show this algorithm is more valid in filtering pulse noise and superior in preserving the edge and texture of image.
     (2) An algorithm to detect human body in video image should work well whether the human is
     moving or not. Background modeling segmentation methods are capable to detect human body on the condition that the human body should keep moving, and become void when keep stable. Two improved algorithms are put forward herein.
     ①MOGs (mixture of Gaussians) are used in modeling background, which is apt for slowing changed background (e.g. illuminant slow changes and wavering leaves). But this traditional method becomes void when abrupt illuminant changing. An improved method is put forward to overcome its incapability.
     ②Segmented human body image is not integrated for many method, which could not be sewed up through morphological techniques. A new method is proposed, which is on the basis that the foreground pixels within an adjacent area inside of a human body image are closely correlated; while the interframe background pixels in a fixed position are consistent. Tests prove it works well, which could get more integral human body image.
     (3)the two above methods are limited to detect human body when human body is always moving in a scene. The following algorithm is based on human skin color detection, which could compensate this limitation. After intensive analysis of method [71], a new data prediction algorithm is adopted. Tests show this algorithm is of high sensitivity and high resolution to detect human skin (e.g. human face) when illuminate keeps stable; and could segment most of skin area when illuminant changes to some extent, which prove robust against illuminant variation.

引文

[1] D. Marr, Vision, W.H. Freeman and Company, 1982.
    [2] 徐光佑,《计算机视觉》,1999.
    [3] Liang Wang, Weiming Hu and Tieniu Tan, “Recent developments in human motion analysis”, Pattern Recognition, vol.36, no. 3, pp. 585-601, 2003.
    [4] A. Lipton, H. Fujiyoshi, R. Patil, “Moving target classification and tracking from real-time video”, In Proc. IEEE Workshop on Applications of Computer Vision, Princeton, NJ, 1998.8
    [5] R. Collins et al, “A system for video surveillance and monitoring: VSAM final report”,Carnegie Mellon University: Technical Report CMU-RI-TR-00-12, 2000
    [6] J. Barron, D. Fleet, S. Beauchemin, “Performance of optical flow techniques”, International Journal of Computer Vision, vol. 12, no. 1, pp. 42-77, 1994.
    [7] Baisheng Chen,Yunqi Lei,and Wangwei Li “A Novel Background Model for Real-Time Vehicle Detection”,ICSP’04 Proceedings,pp.1276-1279.
    [8] C. Stauffer, W.E.L. Crimson. “Adaptive Background Mixture Models for Real-Time Tracking”. In CVPR’99, V01.2, pp.246-252, June 1999.
    [9] Baisheng Chen,Yunqi Lei,and Wangwei Li “A Novel Background Model for Real-Time Vehicle Detection”,ICSP’04 Proceedings,pp.1276-1279.]
    [10] Michael Harville, Gaile Gordon and John Woodfill, Foreground Segmentation Using Adaptive Mixture Models in Color and Depth [A]. IEEE 2001.3~11.
    [11]Liyuan Li, Member, IEEE, Weimin Huang, Member, IEEE, Irene Yu-Hua Gu, Senior Member, IEEE, and Qi Tian, Senior Member, IEEE,”Statistical modeling of complex backgrounds for foreground object detection”IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004 pp.1259-1272.
    [12]D Dementhon. Spatio-Temperal Segmentation of Video by Hierarchical Mean shift Analysis[C]. Statistical Methods in Video Processing Workshop,2002
    [13]H Greenspan,etc. A probabilitystic Framework for Spatio-Temporal Video Representation and Indexing[C]. European Conference on Computer Vision, LNCS1614, Springer-Verlag, Berlin, Germany, 2002. 461-475.
    [14] Oliver N. , Pentland A. , Berard F. . LAFTER : A real-time face and lips t racker with facial expression recognition. Pattern Recognition , 2000 , 33 : 1369～1382
    [15] Tao Lin-Mi , Xu Guang-You. Color issues and applications in machine vision. Chinese Science Bulletin , 2001 , 46 (3) : 178～190(in Chinese)(陶霖密,徐光祐. 机器视觉中的颜色问题及应用. 科学通报,2001 , 46 (3) : 178～190)
    [16] Funt B. , Barnard K. , Martin L. . Is machine colour constancy good enough? In : Proceedings of the 5th European Conference on Computer Vision , University of Freiburg , Germany ,1998 , 445～459
    [17] St?rring M. . Computer vision and human skin colour [ Ph. D.dissertation ] . Computer Vision and Media Technology Laboratory , Aalborg Univer sity , Denmark , 2004 , http :/ / www.cvmt . dk/ ～mst。
    [18] Gong S. G. , McKenna S. , Psarrou A. . Dynamic Vision : From Image to Face Recognition. London : Imperial College Press , 2000 , 66～71 , 136～145
    [19] Yang J . , Lu W. , Waibel A. . Skin color modeling and adaptation. In : Proceedings of t he 3rd Asian Conference on Computer Vision , Hong Kong , China , 1998 , 687～694
    [20] Comaniciu D. , Meer P. . Real-time t racking of non-rigid objects using mean shift . In : Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , South Carolina ,USA , 2000 , 142～149
    [21] Lu J . , Gu Q. , Plataniotis K. . A comparative study of skin-color models. In : Proceedings of the International Conference on Image Analysis and Recognition , Toronto , 2005
    [22] St? rring M. , Andersen H. , Granum E. . Skin colour detection under changing lighting condition. In : Araujo , Dias J .eds. . Proceedings of t he 7th Symposium on Intelligent Robotics Systems , Coimbra , Portugal , 1999 , 187～195
    [23] Cho K. M. , J ang J . H. , Hong K. S. . Adaptive skin-color filter. Pattern Recognition , 2001 , 34 (6) : 1067～1073
    [24] Zhu Q. , Cheng K. , Wu C. , Wu Y. . Adaptive learning of an accurate skin-color model . In : Proceedings of t he 6th IEEE International Conference on Automatic Face and Gesture Recognition ( FGR’04) , 2004 , 37～42
    [25] Comaniciu D. , Ramesh V. . Robust detection and t racking of human faces with an active camera. In : Proceedings of the 3rd IEEE International Workshop on VisualSurveillance ,Dublin , Ireland , 2000 , 11～18
    [26] Birchfield S. . Elliptical head t racking using intensity gradients and color histograms. In : Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , Santa Barbara ,CA , USA , 1998 , 232～237
    [27] Yoo T. W. , Oh I. S. . A fast algorithm for t racking human faces based on chromatic histograms. Pattern Recognition Letters , 1999 , 20 (6) : 967～978
    [28] Zhou Zhi-Yong , Zhou Ji-Liu , Liu Zhi-Ming , He Xin. An algorithm for detecting and tracking human faces based on chromatic histograms and backprojection. Journal of Guizhou University of Technology , 2003 , 32 (3) : 46～50 (in Chinese)(周志勇,周激流,刘智明,贺新. 基于彩色和投影的人脸检测和跟踪算法. 常州工业大学学报, 2003 , 32 (3) : 46～50)
    [29] Soriano M. , Martinkauppi B. , Huovinen S. et al . . Adaptive skin color modeling using the skin locus for selecting t raining pixels. Pattern Recognition , 2003 , 36 (3) : 681～690
    [30] Stern H. , Ef ros B. . Adaptive color space switching for face tracking in multi-colored lighting environments. In : Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition , Washington , DC , USA ,2002 , 249～255
    [31] Shafer S. A. . Using color to separate reflection components. Color Research Application , 1985 , 10 (4) : 210～218
    [32] Klinker G. J . , Shafer S. A. , Kanade T. . Color image analysis with an intrinsic reflection model . In : Proceedings of the 2nd International Conference on Computer Vision , Rome , Ita2ly , 1988 , 292～296
    [33] Klinker G. J . , Shafer S. A. , Kanade T. . A physical approach to color image understanding. International Journal of Computer Vision , 1990 , 4 (1) : 1～38
    [34] Sato Y. , Ikeuchi K. . Temporal color space analysis of reflection. Journal of the Optical Society of America , 1994 , 11(11) : 2990～3002
    [35] St?rring M. , Andersen H. , Granum E. . Physics-based modelling of human skin under mixed illuminants. Journal of Robotics and Autonomous Systems , 2001 , 35 (3～4) : 131～142
    [36] Buluswar S. D. , Bruce A. D. . Color models for outdoor machine vision. Computer Vision and Image Understanding ,2002 , 85 (2) : 71～99
    [37] Pavlidis I. , Symosek P. . The imaging issue in an automatic face/disguise detection system. In : Proceedings of IEEE Workshop on Computer Vision Beyond the Visible Spectrum: Methods and Applications ( CVBVS 2000 ) , Hilton Head ,South Carolina , 2000 , 15～24
    [38] Socolinsky D. A. , Selinger A. , Neuheisel J . D. . Face recognition with visible and thermal infrared imagery. Computer Vision and Image Understanding , 2003 , 91 (1) : 72～114
    [39] Kong S. G. , Heo J . , Abidi B. R. et al . . Recent advances in visual and infrared face recognition ———A review. Computer Vision and Image Understanding , 2005 , 97 (1) : 103～135
    [40] Li Jiang , Yu Wen-Xian , Kuang Gan-Yao , Song Hai-Na. A compound face recognition system design. Journal of National University of Defense Technology , 2003 , 25 (3) : 45～49
    [41] Chen X. , Flynn P. J . , Bowyer K. W. . IR and visible light face recognition. Computer Vision and Image Understanding,2005 , 99 (3) : 332～358
    [42] Heo J . , Kong S. , Abidi B. , Abidi M. . Fusion of visual and thermal signatures with eyeglass removal for robust face recognition. In : Proceedings of IEEE Workshop on Object Tracking and Classification Beyond the Visible Spectrum , in Communication with CVPR’04 , Washington , DC , 2004 , 94～99
    [43] Kenneth R, Castleman. Digital image processing (NJ: Prentice-Hall, 1996).
    [44] R. Eckhorn, H. J. Reitboeck, M. Arndt, P. Dicke. Feature Linking via Synchronization Among Distributed Assemblies. Neural Computing 2, pp. 293-307, 1990.
    [45] J. L. Johnson. Pulse Coupled Neural Nets: Translation, Rotation, Scal, Distortion and Intensity Signal Invariance for Images. Applied Opitcs, Vol.33, no.26, pp.6239-6253, 1994.
    [46] J. L. Johnson, D. Ritter. Observation of Periodic Waves in a Pulse Coupled Neural Network. Optics Letters, Vol.18, 1253, 1993.
    [47] J. L. Johnson, M. L. Padgett. PCNN Models and Applications. IEEE Trans. on Neural Networks, Vol. 10, no. 3, pp. 480-498, 2004.
    [48] J. M. Kinser. Recent Research in Puls-Coupled Neural Networks, SPIE Areosense conf., Orlan, FL, 1996.
    [49] T. Lindblad, “Inherent Features of Wavelets and Pulse Coupled Neural Networks,” IEEE Trans. Neural Networks, Vol.10, No.3, pp: 9204-1092, 1999.
    [50] A.N. Skourikhine, “A Pulse Couple Neural Network for Image Smoothing and Segmentation,” International Symposium on Computational Intelligence, Kosice, Slovakia, 2000.
    [51] G. Kuntimad, H.S. Ranganath, “Perfect Image Segmentation Using Pulse Coupled Neural Networks,” IEEE Trans. Neural Networks, Vol.10, No.3, pp: 591-598,1999.
    [52] R. Eckhorn, H.J. Reitboeck, M. Arndt, and P. Dicke, “Feature Linking via Synchronization among Distributed Assemblies: Simulations of Results from Cat Visual Cortex,” Neural Computing 2(MIT), pp: 293-307, 1990.
    [53] H. S. Ranganath, G. Kuntimad. Iterative segmentation using Pulse Coupled Neural Networks. SPIE volume 2760, pp.543-554.
    [54] Lin Kai-yan,Wu Jun-hui,Xu Li-hong. Summary of segmentation methods in color image[J].Journal of Image and Graphics, 2005:1~10.[林开颜,吴军辉,徐立鸿.彩色图像分割方法综述[J].中国图象图形学报,2005:1~10.]
    [55] Wang Ze-bing,Yang Chao-hui.Research of segmentation techniques in color image[J].Journal of digital television and digital video, 2005,274:20~24.[王泽兵,杨朝晖.彩色图像分割技术研究[J].数字电视与数字视频,2005,274:20~24.]
    [56] Wei Hongb-o,Lv Zhen-su,Jiang Tian-zi,Liu Xin-yan.Survey of image segmentation techniques[J].Gansu science ransaction,2004,19~24.[魏弘博,吕振肃,蒋田仔,刘新艳.图像分割技术纵览[J].甘肃科学学报,2004,19~24.]
    [57] Stauffer C. and Grimson W..Adaptive background mixture models for real-time tracking [A].In Proc. IEEE Conference on Computer Vision and Pattern Recognition [C], Fort Collins, Colorado, 1999.246~252.
    [58] Arandjelovi and R. Cipolla. Incremental learning of temporally-coherent Gaussian mixture models [A].In Proc. British Machine Vision Conference[C], 2005 ,2:759~768.
    [59] Paul.Rosin. Thresholding for Change Detection [A].Computer Vision and Image Understanding,2002,79~95.
    [60] 王军利,王树根. 一种基于 RGB 彩色空间的影像阴影检测方法. 信息技术, 2002, 26(12): 7-9.
    [61]忠武,高广珠等. 图像序列目标检测中阴影的消除. 计算机应用研究, 2000,17(12): 19-20.
    [62]张丽, 李志能. 基于阴影检测的HSV空间自适应背景模型的车辆追踪检测. 中国图像图形学报, 2004, 8(7): 778-782.
    [63] Manuele Bicego, Marco Cristani and Vittorio Murino, “Unsupervised scene analysis: A hidden Markov model approach,” Computer Vision and Image Understanding,2005.1-20.
    [64] Jing Zhong and Stan Sclaroff, “Segmenting Foreground Objects from a Dynamic Textured Background via a Robust Kalman Filter,” Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV’03) 2003.
    [65] C. Stauffer and W. Grimson, “Learning patterns of activity using realtime tracking,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp.747–757, Aug. 2000. 79-95.
    [66] Makito Seki, Toshikazu Wada, Hideto Fujiwara and Kazuhiko Sumi, “Background Subtraction based on Cooccurrence of Image Variations,” Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03).
    [67]M. Song and H. Wang. Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering. Intelligent Computing: Theory And Applications, 2005.1-1
    [68]茆诗松,贝叶斯统计,高等教育出版社,1999,10.170
    [69]胡广书,数字信号处理——理论、算法与实现,清华大学出版社,2001.
    [70]肤色检测技术综述陈锻生刘政凯计算机学报 Vol.29 No.2 Feb,2006 194-207
    [71]Leonid Sigal, Stan Sclaroff, Vassilis Athitsos.Skin Color-Based Video Segmentation under Time-Varying Illumination 862-877 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 7, JULY 2004
    [72] M.J. Jones and J.M. Rehg, “Statistical Color Models with Application to Skin Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 274-280, 1999.。
    [73] Hongming Zhang , Wen Gao, Xilin Chen, Debin Zhao,Object detection using spatial histogram features.Image and Vision Computing 24 (2006) 327–341.
    [74] M.H. Yang and N. Ahuja, “ Gaussian Mixture Model for Human Skin Color and Its Application in Image and Video Databases,”Proc. SPIE Conf. Storage and Retrieval for Image and Video Databases,pp. 458-466, 1999.]。
    [75] M. Storring, H.J. Andersen, and E. Granum, “Skin Colour Detection under Changing Lighting Conditions,” Proc. Seventh Symp. Intelligent Robotics Systems, pp. 187-195, 1999.]
    [76] J.C. Terrillon and S. Akamatsu, “Comparative Performance of Different Chrominance Spaces for Color Segmentation and Detection of Human Faces in Complex Scene Images,” Proc.Vision Interface, pp. 180-187, 1999.]
    [77] M. Soriano, B. Martinkauppi, S. Huovinen, and M. Laaksonen,“Skin Detection in Video under Changing Illumination Conditions,” Proc. Int’l Conf. Pattern Recognition, vol. 1, pp. 839-842, 2000.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700