Interactive System for Video Summarization Based on Multimodal Fusion

英文篇名：Interactive System for Video Summarization Based on Multimodal Fusion
作者：Zheng ; Li ; Xiaobing ; Du ; Cuixia ; Ma ; Yanfeng ; Li ; Hongan ; Wang
英文作者：Zheng Li;Xiaobing Du;Cuixia Ma;Yanfeng Li;Hongan Wang;School of Management,Hefei University of Technology;Jinling Institute of Technology,Nanjing Software Research Institute;University of Chinese Academy of Sciences;Institute of Software,Chinese Academy of Sciences;
英文关键词：video visualization;;interaction;;multimodal fusion;;video summarization
中文刊名：BLGY
英文刊名：北京理工大学学报(英文版)
机构：School of Management,Hefei University of Technology;Jinling Institute of Technology,Nanjing Software Research Institute;University of Chinese Academy of Sciences;Institute of Software,Chinese Academy of Sciences;
出版日期：2019-03-15
出版单位：Journal of Beijing Institute of Technology
年：2019
期：v.28;No.99
基金：Supported by the National Key Research and Development Plan(2016YFB1001200);; the Natural Science Foundation of China(U1435220,61232013);; Natural Science Research Projects of Universities in Jiangsu Province(16KJA520003)
语种：英文;
页：BLGY201901004
页数：8
CN：01
ISSN：11-2916/T
分类号：31-38

摘要

Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.
Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.

引文

[1]Feng S,Lei Z,Yi D,et al.Online content-aware video condensation[C]∥CVPR,2012.
    [2]Lee Y J,Ghosh J,Grauman K.Discovering important people and objects for egocentric video summarization[C]∥CVPR,2012.
    [3]Amir A H W,Iyengar G,Lin C-Y,et al.IBM research TRECVID-2003system[C]∥NIST Text Retrieval Conf(TREC),2003.
    [4]Kolenda T,Hansen L K,Larsen J,et al.Independent component analysis for understanding multimedia content[C]∥IEEE Workshop on Neural Networks for Signal Processing,2002:757-766.
    [5]Langlois T,Chambel T,Oliveira E,et al.VIRUS:video information retrieval using subtitles[C]∥Proceedings of the 14th International Academic Mind Trek Conference:Envisioning Future Media Environments,2010:197-200.
    [6]Katsiouli P,Tsetsos V,Hadjifethymiades S.Semantic video classification based on subtitles and domain terminologies[C]∥Proceedings of the KA-MC,http:∥ceur-ws.org/Vol-253/paper05.pdf,2007.
    [7]Mihalcea R,Tarau P.TextRank:bringing order into texts[C]∥Proceedings of EMNLP,Association for Computational Linguistics,2004:404-411.
    [8]Taniguchi Y,Akutsu A,Tonomura Y.Panorama excerpts:extracting and packing panoramas for video browsing[C]∥Proceedings of the Fifth ACM International Conference on Multimedia,MULTIME-DIA’97,1997:427-436.
    [9]Goldman D B,Curless B,Salesin D,et al.Schematic storyboarding for video visualization and editing[J].ACM Transactions on Graphics,2006,25(3):862-871.
    [10]Hua X S,Li N S,Zhang H J.Video booklet[C]∥Proceedings of International Conference on Multimedia&Expo,IEEE,2005.
    [11]Nguyen C,Niu Y,Liu F.Video summagator:an interface for video summarization and navigation[C]∥Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,2012:647-650.
    [12]Shah R,Narayanan P J.Interactive video manipulation using object trajectories and scene backgrounds[J].IEEE Transactions on Circuits and Systems for Video Technology,2013,23(9):1565-1576.
    [13]Ma Cuixia,Liu Yongjin,Zhao Guozhen,et al.Visualizing and analyzing video content with interactive scalable maps[C]∥IEEE Transactions on Multimedia,2016:1-11.
    [14]Park Seung-Bo,Kim Heung-Nam,Kim Hyunsik,et al.Exploiting script-subtitles alignment to scene boundary dectection in movie[C]∥2010IEEE International Symposium on Multimedia,2010:49-56.
    [15]Yeung M M,Yeo B L.Video visualization for compact presentation and fast browsing of pictorial content[C]∥Circuits&Systems for Video Technology IEEE Transactions on 7.5,1997:771-785.
    [16]Uchihashi S.Video Manga:generating semantically meaningful video summaries[C]∥Proceedings of the7th ACM International Conference on Multimedia’99,Orlando,FL,USA,1999:383-392.
    [17]Goldman D B,Curless B,Salesin D,et al.Schematic storyboarding for video visualization and editing[J].ACM Trans Graph,2006,25(3):862-887.
    [18]Tapaswi M,Buml M,Stiefelhagen R.StoryGraphs:visualizing character interactions as a timeline[C]∥Computer Vision and Pattern Recognition(CVPR),IEEE Conference on,Columbus,OH,2014:827-834.
    [19]Wang Feng,Merialdo Bernard.Multi-document video summarization[C]∥ICME 2009,2009:1326-1329.
    [20]Papandreou George,Katsamanis Athanassios,Pitsikalis Vassilis,et al.Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition[J].IEEE Transactions on Audio,Speech,and Language Processing,2009,17(3):423-435.
    [21]Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315:972-976.
    [22]Otani Mayu,Nakashima Yuta,Sato Tomokazu,et al.Textual description-based video summarization for video blogs[C]∥ICME 2015,2015:1-6.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700