基于标签分布学习的视频摘要算法

英文篇名：Label Distribution Learning for Video Summarization
作者：刘玉杰 ; 唐顺静 ; 高永标 ; 李宗民 ; 李华
英文作者：Liu Yujie;Tang Shunjing;Gao Yongbiao;Li Zongmin;Li Hua;College of Computer & Communication Engineering, China University of Petroleum;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences;University of Chinese Academy of Sciences;
关键词：视频摘要 ; 标签分布学习模型 ; 多标记学习 ; 关键帧
英文关键词：video summarization;;label distribution learning model;;multi-label learning;;key frame
中文刊名：JSJF
英文刊名：Journal of Computer-Aided Design & Computer Graphics
机构：中国石油大学计算机与通信工程学院;中国科学院计算技术研究所智能信息处理重点实验室;中国科学院大学;
出版日期：2019-01-15
出版单位：计算机辅助设计与图形学学报
年：2019
期：v.31
基金：国家自然科学基金(61379106,61379082,61227802);; 山东省自然科学基金(ZR2015FM011,ZR2013FM036,ZR2015FM022)
语种：中文;
页：JSJF201901013
页数：7
CN：01
ISSN：11-2925/TP
分类号：106-112

摘要

针对现有监督视频摘要算法中存在的模型训练复杂问题,提出一种新的基于标签分布学习(LDL)的视频摘要算法,采用非参数监督学习的方式生成视频摘要,利用标签传递的方法将摘要结构从带有注释的视频转移到相同类型的测试视频中.首先提取视频的卷积神经网络特征和颜色特征,将两者融合后进行降维得到特征矩阵;然后将特征矩阵与训练样本的标签分布一起输入到LDL模型中;最后根据模型输出的标签分布选取关键帧,生成视频摘要.在基准数据集上与其他算法的实验表明,该算法生成的摘要与用户创建的摘要一致性很高,明显优于其他算法.
There is a problem of complicated model training in the supervised video digest algorithm. To solve this problem, a new video summary algorithm based on label distribution learning(LDL) is proposed. This algorithm uses non-parametric supervised learning to generate summarization. The main idea is to transfer summary structures from the annotated video to the same type of test video by label passing. Firstly, the convolutional neural network features and color features of the video are extracted. A feature matrix is obtained by combining these two features and reducing the dimension. It is then entered into the LDL model along with the label distribution of the training samples. Finally, the key frames are selected according to the label distribution of the model output, and they are composed into a video summary. By comparing the experiments with other algorithms on the benchmarks, it shows that the summaries generated by this algorithm are highly consistent with the human-created abstract, which is obviously superior to other methods.

引文

[1]Heilbron F C,Escorcia V,Ghanem B,et al.ActivityNet:a large-scale video benchmark for human activity understanding[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:961-970
    [2]Lu Y,Bai X,Shapiro L,et al.Coherent parametric contours for interactive video object segmentation[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2016:642-650
    [3]Gong B Q,Chao W L,Grauman K,et al.Diverse sequential subset selection for supervised video summarization[C]//Proceedings of the 28th Annual Conference on Neural Information Processing Systems.Cambridge:MIT Press,2014:2069-2077
    [4]Lee Y J,Ghosh J,Grauman K.Discovering important people and objects for egocentric video summarization[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2012:1346-1353
    [5]Liu D,Hua G,Chen T.A hierarchical visual model for video object summarization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(12):2178-2190
    [6]Lu Z,Grauman K.Story-driven summarization for egocentric video[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2013:2714-2721
    [7]Nam J,Tewfik A H.Event-driven video abstraction and visualization[J].Multimedia Tools and Applications,2002,16(1/2):55-77
    [8]Furini M,Geraci F,Montangero M,et al.STIMO:still and moving video storyboard for the web scenario[J].Multimedia Tools and Applications,2010,46(1):47-69
    [9]Glodman D B,Curless B,Salesin D,et al.Schematic storyboarding for video visualization and editing[C]//Processings of ACM Transactions on Graphics.New York:ACM Press,2006:862-871
    [10]Kang H W,Matsushita Y,Tang X O,et al.Space-time video montage[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2006,2:1331-1338
    [11]Sun M,Farhadi A,Taskar B,et al.Salient montages from unconstrained videos[C]//Proceedings of the European Conference on Computer Vision.Heidelberg:Springer,2014,8695:472-488
    [12]Gygli M,Grabner H,Riemenschneider H,et al.Creating summaries from user videos[C]//Proceedings of the European Conference on Computer Vision.Heidelberg:Springer,2014,8695:505-520
    [13]Poleg Y,Halperin T,Arora C,et al.EgoSampling:fast-forward and stereo for egocentric videos[C]//Proceedings of the IEEEComputer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2014:4768-4776
    [14]Zen G,de Juan P,Song Y,et al.Mouse activity as an indicator of interestingness in video[C]//Proceedings of the ACM on International Conference on Multimedia Retrieval.New York:ACM Press,2016:47-54
    [15]Sharghi A,Gong B Q,Shah M.Query-focused extractive video summarization[C]//Proceedings of the European Conference on Computer Vision.Heidelberg:Springer,2016,9912:3-19
    [16]Hong R C,Tang J H,Tan H K,et al.Event driven summarization for web videos[C]//Proceedings of the 1st SIGMM Workshop on Social Media.New York:ACM Press,2009:43-48
    [17]Khosla A,Hamid R,Lin C J,et al.Large-scale video summarization using web-image priors[C]//Proceedings of the IEEEComputer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2013:2698-2705
    [18]Ngo C W,Ma Y F,Zhang H J.Automatic video summarization by Graph modeling[C]//Proceedings of the 9th IEEE International Conference on Computer Vision.Los Alamitos:IEEEComputer Society Press,2003,2:104-109
    [19]Zhang H J,Wu J H,Zhong D,et al.An integrated system for content-based video retrieval and browsing[J].Pattern Recognition,1997,30(4):643-658
    [20]Gygli M,Grabner H,van Gool L.Video summarization by learning submodular mixtures of objectives[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:3090-3098
    [21]Zhang K,Chao W L,Sha F,et al.Summary transfer:exemplar-based subset selection for video summarization[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEEComputer Society Press,2016:1059-1067
    [22]Geng X.Label distribution learning[J].IEEE Transactions on Knowledge and Data Engineering,2013,28(7):1734-1748
    [23]Hu W M,Xie N H,Li,et al.A survey on visual content-based video indexing and retrieval[J].IEEE Transactions on Systems,Man,and Cybernetics,Part C,2011,41(6):797-819
    [24]Huang Dongjun,Ingabire Marie Ange.Video scene cutting based on HSV space model[J].Computer Technology and Development,2009,19(9):9-12(in Chinese)(黄东军,安琪.基于HSV空间模型的视频场景切分[J].计算机技术与发展,2009,19(9):9-12)
    [25]de Avila S E F,Lopes A P B,da Luz A,Jr,et al.VSUMM:a mechanism designed to produce static video summaries and a novel evaluation method[J].Pattern Recognition Letters,2011,32(1):56-68
    [26]Mundur P,Rao Y,Yesha Y.Keyframe-based video summarization using Delaunay clustering[J].International Journal on Digital Libraries,2006,6(2):219-232
    [27]Kuanar S K,Panda R,Chowdhury A S.Video key frame extraction through dynamic Delaunay clustering with a structural constraint[J].Journal of Visual Communication and Image Representation,2013,24(7):1212-1227

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700