利用网络图像增强行为识别

英文篇名：Improvement of Action Recognition Using Web Images
作者：闻号
英文作者：WEN Hao;School of Electronics and Information Engineering,Anhui University;
关键词：网络学习 ; 迁移学习 ; 行为识别 ; 密集轨迹 ; 字典学习
英文关键词：Web learning;;transfer learning;;action recognition;;dense trajectory;;dictionary learning
中文刊名：WJFZ
英文刊名：Computer Technology and Development
机构：安徽大学电子信息工程学院;
出版日期：2018-09-21 10:31
出版单位：计算机技术与发展
年：2019
期：v.29;No.261
基金：安徽省自然科学基金(1508085MF120)
语种：中文;
页：WJFZ201901007
页数：4
CN：01
ISSN：61-1450/TP
分类号：37-40

摘要

鉴于商业视觉搜索引擎的日益成熟,网络数据可能是下一个扩大视觉识别的重要数据源。通过观察发现,动作名称查询到的网络图像具有歧视性的动作场景。网络图像的歧视性信息和视频的时间信息之间有相互补充的优势。在此基础上提出一种利用大量的网络图像来增强行为识别的方法。具体框架是:提取行为视频的密集轨迹特征,并与网络图像特征相结合后放入支持向量机中训练分类。该方法是一个跨域学习问题,为了有效地利用网络图像特征,引入了跨域字典学习算法来处理网络图像,以解决网络图像域和视频域之间存在的域差异问题。由于网络图像可以轻松地在网络上获取,所以该方法几乎零成本地增强行为识别。在KTH和YouTube数据集上的实验结果表明,该方法有效提高了人体行为识别的准确率。
In view of the growing maturity of commercial visual search engines,Web data may be the next important data source to expand visual recognition. It is observed that the Web images queried by the action name is discriminatory to the action scene. Clearly,there are complementary benefits between the temporal information available in videos and the discriminatory scenes portrayed in images.On the basis,we propose an algorithm which can enhance action recognition by using a large number of Web images. We extract the dense trajectory feature of behavior video and put it into support vector machine for training classification in combination with Web image feature. This algorithm is a cross-domain learning problem. In order to effectively use Web image features,we introduce a cross-domain dictionary learning algorithm to deal with Web images for solving the domain differences between Web image domain and video domain.Because the Web images can be easily obtained on the network,it can enhance action recognition with at almost zero cost. Experiment shows that the proposed algorithm can improve the accuracy of human action recognition effectively on KTH and YouTube datasets.

引文

[1] WANG Heng,KLASER A,SCHMID C,et al. Action recognition by dense trajectories[C]//IEEE conference on computer vision and pattern recognition. Providence,RI,USA:IEEE,2011:3169-3176.
    [2]宋健明,张桦,高赞,等.基于多时空特征的人体动作识别算法[J].光电子·激光,2014,25(10):2009-2017.
    [3]秦华标,张亚宁,蔡静静.基于复合时空特征的人体行为识别方法[J].计算机辅助设计与图形学学报,2014,26(8):1320-1325.
    [4]刘雨娇,范勇,高琳,等.基于时空深度特征的人体行为识别算法[J].计算机工程,2015,41(5):259-263.
    [5] GAN Chuang,YAO Ting,YANG Kuiyuan,et al. You lead,w e exceed:labor-free video concept learning by jointly exploiting w eb videos and images[C]//IEEE conference on computer vision and pattern recognition. Las Vegas,NV,USA:IEEE,2016:923-932.
    [6] GAN Chuang,SUN Chen,DUAN Lixin,et al. Webly-supervised video recognition by mutually voting for relevant w eb images and w eb video frames[C]//European conference on computer vision.[s. l.]:Springer International Publishing,2016:849-866.
    [7] LAPTEV I,MARSZALEK M,SCHMID C,et al. Learning realistic human actions from movies[C]//IEEE conference on computer vision and pattern recognition. Anchorage,AK,USA:IEEE,2008:1-8.
    [8] GORELICK L,BLANK M,SHECHTMAN E,et al. Action as space-time shapes[J]. IEEE Transactions on Pattern Analysis&M achine Intelligence,2005,29(12):2247-2253.
    [9] ZHENG Jingjing,JIANG Zhuolin,PHILLIPS P J,et al. Cross-view action recognition via a transferable dictionary pair[C]//BMVC.[s. l.]:[s. n.],2012:1-11.
    [10] ZHU Fan,SHAO Ling. Weakly-supervised cross-domain dictionary learning for visual recognition[J]. International Journal of Computer Vision,2014,109(1-2):42-59.
    [11] AHARON M,ELAD M,BRUCKSTEIN A. rmK-SVD:an algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
    [12] YUAN Junsong,LIU Zicheng,WU Ying. Discriminative subvolume search for efficient action detection[C]//IEEE conference on computer vision and pattern recognition. M iami,FL,USA:IEEE,2009:2442-2449.
    [13] LIU Jingen,LUO Jiebo,SHAH M. Recognizing realistic actions from videos in the w ild[C]//IEEE conference on computer vision and pattern recognition. M iami,FL,USA:IEEE,2009:1996-2003.
    [14] IKIZLERCINBIS N,SCLAROFF S. Object,scene and actions:combining multiple features for human action recognition[C]//European conference on computer vision.[s. l.]:[s. n.],2010:494-507.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700