摘要
提出一种基于全局视觉优化的视频目标检测算法,在原有流引导特征聚合算法(FGFA)基础上,更关注寻找检测精度与运行时间之间的折衷策略。首先利用全局视觉优化的思想,使用感知哈希算法在多帧特征聚合前进行帧段两端的全局视觉相似度计算,判断当前帧段的时序信息相关性;其次使用连续帧作为输入,进一步利用视频的时序信息,将相邻帧在运动路径上的特征聚合到当前帧的特征中,进而对视频特征有更好的表达。ILSVRC实验表明,经过全局视觉优化的预处理后,该算法较原本算法在视频中进行目标检测的准确率和速度均得到一定提升。
This paper proposes a global vision optimized algorithm for video object detection. On the basis of original flow-guided feature aggregation algorithm( FGFA),more attention is paid to find a compromise strategy between detection accuracy and running time. Firstly,under the inspiration of global vision optimization,perceptual hashing algorithm is used to calculate the global visual similarity at both ends of the frame segment before multi-frame features aggregation,which is made for judging the correlation of temporal information of each current frame segment. Secondly,continuous frames are used as an input to aggregate the features of adjacent frames on the motion path into the features of current frame,thus the characteristics of video are better expressed furtherly. Experiments on ILSVRC show that after pre-processing of global vision optimization,the proposed method has higher accuracy and faster rate of video object detection than the original algorithm.
引文
[1]Zhu Xizhou,Wang Yujie,Dai Jifeng,et al. Flow-guided feature aggregation for video object detection[C]. Proceedings of International Conference on Computer Vision,2017:408-417.
[2]CHEN K,SONG H,LOY C C,et al. Discover and learn new objects from documentaries[C]. IEEE Conference on Computer Vision and Pattern Recognition,2017:3087-3096.
[3]MISRA I,SHRIVASTAVA A,HEBERT M. Watch and learn:Semi-supervised learning for object detectors from video[C].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2015:3593-3602.
[4]WEI H,KHORRAMI P,PAINE T L,et al. Seq-NMS for video object detection[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2016.
[5]Kang Kai,Li Hongsheng,Xiao Tong,et al. Object detection in videos with tubelet proposal networks[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2017:727-735.
[6]FEICHTENHOFER C,PINZ A,ZISSERMAN A. Detect to track and track to detect[C]. Proceedings of International Conference on Computer Vision,2017:3057-3065.
[7]Zhu Xizhou,Xiong Yuwen,Dai Jifeng,et al. Deep feature flow for video recognition[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2017:2349-2358.
[8]Chan Sixian,Zhou Xiaolong,Zhang Zhuo,et al. Compressive tracking with locality sensitive histograms features[C]. Proceedings of IEEE International Conference on Robotics and Automation,2017:1974-1981.
[9]Qin Chuan,Chen Xueqin,Ye Dengpan,et al. A novel image hashing scheme with perceptual robustness using block truncation coding[J]. Information Sciences,2016,361(1):84-99.
[10]Eng-Jon Ong,BOBER M. Improved Hamming distance search using variable length hashing[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2016:2000-2008.