融合视音频特征的视频广告表现力识别

英文篇名：Video Advertisement Power Recognition Based on Video-audio Feature Fusion
作者：程俊 ; 吉祥 ; 马云峰 ; 张喜龙 ; 戴永恒
英文作者：CHENG Jun;JI Xiang;MA Yun-feng;ZHANG Xi-long;DAI Yong-heng;China Academy of Electronics and Information Technology;
关键词：视频广告 ; 视频表现力 ; 视音频特征
英文关键词：video advertisement;;video power;;video-audio feature
中文刊名：KJPL
英文刊名：Journal of China Academy of Electronics and Information Technology
机构：中国电子科学研究院;
出版日期：2019-01-20
出版单位：中国电子科学研究院学报
年：2019
期：v.14;No.81
基金：中国博士后科学基金(2015M581145)
语种：中文;
页：KJPL201901020
页数：3
CN：01
ISSN：11-5401/TN
分类号：102-104

摘要

视频广告的表现力直接影响着商业品牌的树立和商品的营销,但是视频广告表现力的识别尚不能满足人们的需求。为了自动、量化的让计算机识别广告视频的表现力,本文提出了一种直接利用视音频特征对是视频广告表现力进行识别的算法,该算法融合BoW、GIST、颜色矩特征、颜色直方图和音频特征进行识别,实验结果表明了融合视音频特征可以较好的对视频广告表现力进行识别,并且融合多种特征进行识别的准确率要高于单独使用一种特征。
The power of the video advertisement determined the establishment of commercial brands and the sales of goods directly. However,the recognition of the power of the video advertisement has been far away from satisfaction. In order to recognize the power of the video advertisement via computer automatically and quantitatively,this paper proposed an algorithm which recognized the power of the video advertisement directly by fusing video's visual and audio features. These features contained Bow,GIST,Color moment,Color histogram and audio feature. The experimental results demonstrated the fusion of visual and audio features can recognized the power of the video advertisement well,and also the fusion of features outperformed single feature.

引文

[1] X. Ji,J. Han,X. Hu,K. Li,F. Deng,J. Fang,L.Guo,and T. Liu,“Retrieving video shots in semantic brain imaging space using manifold-ranking,” in Image Processing(ICIP),2011 18th IEEE International Conference on,2011:3633-3636.
    [2] J. Han,X. Ji,X. Hu,D. Zhu,K. Li,X. Jiang,G.Cui,L. Guo,and T. Liu,“Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space,”IEEE Trans. Image Process.,2013,22(7):2723-2736.
    [3]朱方,吴莉,陈飞凌,袁卫忠.智能视频监控终端在物联网中的应用和发展研究.中国电子科学研究院学报,2011,6(6):561-566.
    [4] F. Li,P. Perona.“A bayesian hierarchical model for learning natural scene categories,”IEEE Conference on Computer Vision and Pattern Recognition, 2005,2:524-531.
    [5] D. G. Lowe.“Object recognition from local scale-invariant features,”The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999,2:1150-1157.
    [6] A. Oliva and A. Torralba,“Modeling the shape of the scene:A holistic representation of the spatial envelope,”Int. J. Comput. Vis.,2001,42(3):145-175.
    [7] A. Oliva and A. Torralba,“Building the gist of a scene:The role of global image features in recognition,”Prog.Brain Res.,2006,155:23-36.
    [8] O. Lartillot and P. Toiviainen,“A matlab toolbox for musical feature extraction from audio,” in Proc. Int’l Conf. on Digital Audio Effects,2007:237-244.
    [9] C.-C. Chang and C.-J. Lin,“LIBSVM:a library for support vector machines,” ACM Trans. Intell. Syst.Technol.,2011,2(3):27.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700