采用非线性块稀疏字典选择的视频总结

英文篇名：Nonlinear Block Sparse Dictionary Selection for Video Summarization
作者：马明阳 ; 梅少辉 ; 万帅
英文作者：MA Mingyang;MEI Shaohui;WAN Shuai;School of Electronics and Information, Northwestern Polytechnical University;
关键词：视频总结 ; 稀疏表示 ; 非线性 ; 块稀疏 ; 字典选择
英文关键词：video summarization;;sparse representation;;nonlinearity;;block sparsity;;dictionary selection
中文刊名：XAJT
英文刊名：Journal of Xi'an Jiaotong University
机构：西北工业大学电子信息学院;
出版日期：2018-12-19 10:14
出版单位：西安交通大学学报
年：2019
期：v.53
基金：国家自然科学基金资助项目(61671383);; 中央高校基本科研业务费专项资金资助项目(3102018AX001)
语种：中文;
页：XAJT201905019
页数：7
CN：05
ISSN：61-1069/T
分类号：148-154

摘要

针对基于稀疏表示的视频总结未充分考虑视频帧之间的非线性关系和关键帧的块稀疏特性,提出了一种采用非线性块稀疏字典选择的视频总结方法。首先考虑视频帧之间的非线性关系,通过核函数把原始视频样本映射到高维空间,使线性不可分的样本变得线性可分,从而实现非线性到线性的转化,建立非线性稀疏字典选择模型;然后考虑关键帧的块稀疏特性,将视频帧分成帧块,每个帧块内的内容具有一定的相似性,进一步建立非线性块稀疏字典选择模型来提取关键帧块;最后设计了一种核化的联合块正交匹配追踪算法对提出的模型进行优化。在基准视频数据集上的实验表明,所提算法能明显提升视频总结的性能指标F值,且计算复杂度较低,从而验证了联合使用非线性和块稀疏的有效性。
The current video summarization based on sparse representation does not fully consider the nonlinear relationship between video frames and the block sparsity of key-frames. In this paper, a video summarization scheme with nonlinear block sparse dictionary selection is proposed. The nonlinearity among video frames is considered, and the original video samples are mapped to a very high-dimensional space via a kernel function, which makes the linearly inseparable samples become separable ones to transform the nonlinear cases to linear, thus a nonlinear sparse dictionary selection model is established. The video frames are divided into frame blocks to introduce the block sparsity of key-frames, where the frame contents are similar, and a nonlinear block sparse dictionary selection model is further established to extract key-frame blocks. A kernelized simultaneous block-orthogonal matching pursuit(KSBOMP) method is designed to optimize the proposed model. Experimental analysis for the benchmark video dataset indicates that KSBOMP method can significantly improve the summarization performance in terms of F-score with a low computation complexity, which verifies the effectiveness of simultaneously using nonlinearity and block sparsity.

引文

[1]王娟,蒋兴浩,孙锬锋.视频摘要技术综述[J].中国图象图形学报,2014,19(12):1685-1695.WANG Juan,JIANG Xinghao,SUN Tanfeng.Review of video abstraction[J].Journal of Image and Graphics,2014,19(12):1685-1695.
    [2]郝雪,彭国华.基于SVD和稀疏子空间聚类的视频摘要[J].计算机辅助设计与图形学学报,2017,29(3):485-492.HAO Xue,PENG Guohua.Video summarization based on SVD and sparse subspace clustering[J].Journal of Computer-Aided Design and Computer Graphics,2017,29(3):485-492.
    [3]MUNDUR P,RAO Y,YESHA Y.Keyframe-based video summarization using Delaunay clustering[J].International Journal on Digital Libraries,2006,6(2):219-232.
    [4]FURINI M,GERACI F,MONTANGERO M,et al.STIMO:STIll and MOving video storyboard for the web scenario[J].Multimedia Tools and Applications,2010,46(1):47-69.
    [5]AVILA S E F D,LOPES A P B,JUZ A D L,et al.VSUMM:a mechanism designed to produce static video summaries and a novel evaluation method[J].Pattern Recognition Letters,2011,32(1):56-68.
    [6]郑慧君,陈俞强.基于MapReduce的快速视频镜头边界检测算法[J].图学学报,2017,38(1):76-81.ZHENG Huijun,CHEN Yuqiang.Fast video shot boundary detection algorithm based on MapReduce[J].Journal of Graphics,2017,38(1):76-81.
    [7]王瑞佳,牛之贤,宋春花,等.一种改进的基于互信息量的镜头边界检测算法[J].科学技术与工程,2018,18(8):228-236.WANG Runjia,NIU Zhixian,SONG Chunhua,et al.An improved shot boundary detection algorithm based on mutual information[J].Science Technology and Engineering,2018,18(8):228-236.
    [8]LI J,YAO T,LING Q,et al.Detecting shot boundary with sparse coding for video summarization[J].Neurocomputing,2017,266:66-78.
    [9]MADEMLIS I,TEFAS A,PITAS I.A salient dictionary learning framework for activity video summarization via key-frame extraction[J].Information Sciences,2018,432:319-331.
    [10]KUMAR M,LOUI A C.Key frame extraction from consumer videos using sparse representation[C]∥Proceedings of the International Conference on Image Processing.Piscataway,NJ,USA:IEEE,2011:2437-2440.
    [11]CONG Y,YUAN J,LUO J.Towards scalable summarization of consumer videos via sparse dictionary selection[J].IEEE Transactions on Multimedia,2012,14(1):66-75.
    [12]MEI S,GUAN G,WANG Z,et al.Video summarization via minimum sparse reconstruction[J].Pattern Recognition,2015,48(2):522-533.
    [13]CONG Y,LIU J,SUN G,et al.Adaptive greedy dictionary selection for web media summarization[J].IEEE Transactions on Image Processing,2017,26(1):185-195.
    [14]MA M,MEI S,HOU J,et al.Nonlinear kernel sparse dictionary selection for video summarization[C]∥Proceedings of the International Conference on Multimedia and Expo.Piscataway,NJ,USA:IEEE,2017:637-642.
    [15]GAO S,TSANG I W,CHIA L T.Sparse representation with kernels[J].IEEE Transactions on Image Processing,2013,22(2):423-434.
    [16]WU J,REHG J M.CENTRIST:a visual descriptor for scene categorization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1489-1501.
    [17]VIDAL R.See all by looking at a few:sparse modeling for finding representative objects[C]∥Proceedings of the Computer Vision and Pattern Recognition.Piscataway,NJ,USA:IEEE,2012:1600-1607.
    [18]MEI S,GUAN G,WANG Z,et al.L2,0constrained sparse dictionary selection for video summarization[C]∥Proceedings of the International Conference on Multimedia and Expo.Piscataway,NJ,USA:IEEE,2014:1-6.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700