基于WGAN网络的自然视频预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Natural Video Prediction Based on WGAN Network
  • 作者:李敏 ; 仝明磊 ; 范绿源 ; 南昊
  • 英文作者:LI Min;TONG Ming-lei;FAN Lv-yuan;NAN Hao;School of Electronic and Information Engineering, Shanghai University of Electric Power;
  • 关键词:视频预测 ; Wasserstein对抗生成网络 ; 多尺度 ; 拉普拉斯金字塔模型
  • 英文关键词:video prediction;;Wasserstein generative adversarial networks(WGAN);;multi-scale;;Laplacian pyramid
  • 中文刊名:YBJI
  • 英文刊名:Instrumentation Technology
  • 机构:上海电力学院电子与信息工程学院;
  • 出版日期:2019-04-15
  • 出版单位:仪表技术
  • 年:2019
  • 期:No.360
  • 基金:上海市自然科学基金资助项目(16ZR1413300)
  • 语种:中文;
  • 页:YBJI201904001
  • 页数:5
  • CN:04
  • ISSN:31-1266/TH
  • 分类号:5-9
摘要
计算机视觉技术已经在学术界和工业界取得了巨大的成果,近年来,视频预测已经成为一个重要的研究领域。现有基于生成对抗网络的视频预测模型在训练中需要小心平衡生成器和判别器的训练,生成模型多样性不足。针对这些问题,提出用Wasserstein对抗生成网络(WGAN)代替生成对抗网络,采用拉普拉斯金字塔模型的级联卷积网络训练一个多尺度的卷积网络,根据输入视频序列预测未来几帧,再由低分辨率到高分辨率的迭代去生成比较清晰的图像。最后在UCF-101数据集上进行了实验,并与不同的网络结构进行了比较,实验结果表明,改进的网络在数据集的实验结果优于现有的视频生成模型。
        The computer vision technology has already been successfully applied in both academia and industry. In recent years, natural video prediction has become a prominent area of deep learning research. The existing video prediction models based on generating confrontation networks need maintaining a careful balance in training of the discriminator and generator, and the mode dropping phenomenon is also drastically reduced. To solve above mentioned problems, this paper adopts the Wasserstein Generation Adversarial Network(WGAN) instead of generating the confrontation network. A cascaded convolution network using the Laplacian pyramid model is used to train a multi-scale convolution network to predict the next few frames according to the input video sequence. The clearer image can be iteratively generated from low resolution to high resolution. Finally, experiments are carried out on the UCF-101 dataset and compared with different network structures. The experimental results show that the improved network surpasses the existing video generation model in the dataset.
引文
[1] VELASCO J, PIZARRO D, MACIASGUARASA J. Source localization with acoustic sensor arrays using generative model based fitting with sparse constraints[J]. Sensors, 2012, 12(10):13781-13812.
    [2] VU T H, OLSSON C, LAPTEV I,et al. Predicting actions from static scenes[M].Berlin:Springer International Publishing, 2014.
    [3] 杨敏捷. 视频编码的帧内及帧间预测技术研究[D].成都:电子科技大学,2018.
    [4] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780.
    [5] BENGIO Y. Learning deep architectures for AI[J]. Foundations & Trends in Machine Learning, 2009, 2(1):1-127.
    [6] 袁帅,秦贵和,晏婕.应用残差生成对抗网络的路况视频帧预测模型[J].西安交通大学学报,2018(10):1-7.
    [7] 孙亮,韩毓璇,康文婧,等.基于生成对抗网络的多视图学习与重构算法[J].自动化学报,2018,44(5):819-828.
    [8] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4):600-612.
    [9] 王剑飞,林金花,王璐.改进的空间体素融合方法及其在线重建[J].湖南大学学报:自然科学版,2018,45(2):141-151.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700