摘要
在智能监控领域,实现人群计数具有重要价值,针对人群尺度不一、人群密度分布不均及遮挡等问题,提出一种多尺度多任务卷积神经网络(MMCNN)进行人群计数的方法。首先提出一种新颖的自适应人形核生成密度图描述人群信息,消除人群遮挡影响;其次通过构建多尺度卷积神经网络解决人群尺度不一问题,以多任务学习机制同时估计密度图及人群密度等级,解决人群分布不均问题;最后设计一种加权损失函数,提高人群计数准确率。在UCF_CC_50和World Expo'10数据库上进行了评估,验证了自适应人形核的有效性。实验结果表明:所提算法比Sindagi等的方法 (SINDAGI V A,PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway,NJ:IEEE,2017:1-6)在UCF_CC_50数据库上平均绝对误差(MAE)数值和均方误差(MSE)数值分别降低约1. 7和45;与Zhang等的方法(ZHANG Y,ZHOU D,CHEN S,et al. Single-image crowd counting via multi-column convolutional neural network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington,DC:IEEE Computer Society,2016:589-597)相比,在World Expo'10数据库上所提算法的MAE值降低约1. 5,且在真实公共汽车数据库上仅0~3人的计数误差,表明其实用性较强。
Crowd counting has played a significant role in the field of intelligent surveillance. Concerning the problem of scale variation, non-uniform density distribution and partial occlusion of crowds, a method of crowd counting using Multi-scale Multi-task Convolutional Neural Network( MMCNN) was proposed to solve existing challenges in crowd counting. Initially, a novel adaptive human-shaped kernel was used to generate a density map which described the population information, and the partial occlusion was eliminated. Then, scale variation was handled through constructing a multi-scale convolutional neural network and non-uniform density distribution was resolved by the multi-task learning mechanism, which simultaneously estimate the density map and density level of crowds. Further, a weighted loss function was proposed to improve the accuracy of crowd counting. Evaluations in UCF_CC_50 and World Expo'10 datasets revealed the effectiveness of the proposed adaptive human-shaped kernel. The experimental results show that, compared with the method proposed by Sindagi et al.( SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting.Proceedings of the 2017 14 th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway,NJ: IEEE, 2017: 1-6), the Mean Absolute Error( MAE) and Mean Squared Error( MSE) of the proposed method in UCF_CC_50 dataset is decreased by 1. 7 and 45 respectively. Compared with the method proposed by Zhang et al.( ZHANG Y,ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network. Proceedings of the2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2016: 589-597), the MAE of the proposed method in World Expo'10 dataset is decreased by 1. 5. Simultaneously, evaluations in practical bus videos with an error of approximately 0-3, which verifies the practicability of the proposed counting approach.
引文
[1]RYAN D, DENMAN S, SRIDHARAN S, et al. An evaluation of crowd counting methods, features and regression models[J]. Com-puter Vision and Image Understanding, 2015, 130(C):1-17.
[2]FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al.Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.
[3]GAO C, LIU J, FENG Q, et al. People-flow counting in complex environments by combining depth and color information[J]. Multimedia Tools and Applications, 2016, 75(15):9315-9331.
[4]LUO J, WANG J, XU H, et al. Real-time people counting for indoor scenes[J]. Signal Processing, 2016, 124:27-35.
[5]ANTIC B, LETIC D, CULIBRK D, et al. K-means based segmentation for real-time zenithal people counting[C]//Proceedings of the2009 16th IEEE International Conference on Image Processing. Piscataway, NJ:IEEE, 2009:2565-2568.
[6]RAO A S, GUBBI J, MARUSIC S, et al. Estimation of crowd density by clustering motion cues[J]. The Visual Computer, 2015, 31(11):1533-1552.
[7]CHAN A B, LIANG Z S J, VASCONCELOS N. Privacy preserving crowd monitoring:counting people without people models or tracking[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2008:1-7.
[8]姬丽娜,陈庆奎,陈圆金,等.基于GPU的视频流人群实时计数[J].计算机应用,2017,37(1):145-152.(JI L N, CHEN Q K,CHEN Y J, et al. Real-time crowd counting method from video stream based on GPU[J]. Journal of Computer Applications, 2017,37(1):145-152.)
[9]HASHEMZADEH M, FARAJZADEH N. Combining keypoint-based and segment-based features for counting people in crowded scenes[J]. Information Sciences, 2016, 345:199-216.
[10]SIVA P, SHAFIEE M J, JAMIESON M, et al. Scene invariant crowd segmentation and counting using scale-normalized Histogram of Moving Gradients(Ho MG)[J]. Ar Xiv Preprint, 2016, 2016:1602. 00386.
[11]ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:833-841.
[12]OORO-RUBIO D, LPEZ-SASTRE R J. Towards perspectivefree object counting with deep learning[C]//Proceedings of the2016 European Conference on Computer Vision. Berlin:Springer,2016:615-629.
[13]HU Y, CHANG H, NIAN F, et al. Dense crowd counting from still images with convolutional neural networks[J]. Journal of Visual Communication and Image Representation, 2016, 38:530-539.
[14]SHENG B, SHEN C, LIN G, et al. Crowd counting via weighted VLAD on dense attribute feature maps[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 28(8):1788-1797.
[15]KANG D, DHAR D, CHAN A B. Crowd counting by adapting convolutional neural networks with side information[J]. Ar Xiv Preprint, 2016, 2016:1611. 06748.
[16]时增林,叶阳东,吴云鹏,等.基于序的空间金字塔池化网络的人群计数方法[J].自动化学报,2016,42(6):866-874.(SHI Z L, YE Y D, WU Y P, et al. Crowd counting using rank-based spatial pyramid pooling network[J]. Acta Automatica Sinica,2016, 42(6):866-874.)
[17]ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597.
[18]SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway, NJ:IEEE, 2017:1-6.
[19]MARSDEN M, MCGUINNESS K, LITTLE S, et al. Resnet Crowd:a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification[C]//Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway, NJ:IEEE, 2017:1-7.
[20]ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597.
[21]ZEILER M D, RANZATO M, MONGA R, et al. On rectified linear units for speech processing[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2013:3517-3521.
[22]WANG T, LI G, LEI J, et al. Crowd counting based on MMCNN in still images[C]//Proceedings of the 2017 Scandinavian Conference on Image Analysis. Berlin:Springer, 2017:468-479.
[23]FU M, XU P, LI X, et al. Fast crowd density estimation with convolutional neural networks[J]. Engineering Applications of Artificial Intelligence, 2015, 43:81-88.
[24]IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multiscale counting in extremely dense crowd images[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2013:2547-2554.
[25]KANG D, MA Z, CHAN A B. Beyond counting:comparisons of density maps for crowd analysis tasks—counting, detection, and tracking[J]. IEEE Transactions on Circuits&Systems for Video Technology, 2017, PP(99):1-1.
[26]覃勋辉,王修飞,周曦,等.多种人群密度场景下的人群计数[J].中国图象图形学报,2013,18(4):392-398.(QIN X H,WANG X F, ZHOU X, et al. Counting people in various crowed density scenes using support vector regression[J]. Journal of Image and Graphics, 2013, 18(4):392-398.)