基于深度卷积长短时神经网络的视频帧预测

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于深度卷积长短时神经网络的视频帧预测

详细信息查看全文 | 推荐本文 |

英文篇名：Video frame prediction based on deep convolutional long short-term memory neural network
作者：张德正 ; 翁理国 ; 夏旻 ; 曹辉
英文作者：ZHANG Dezheng;WENG Liguo;XIA Min;CAO Hui;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(Nanjing University of Information Science & Technology);
关键词：视频帧预测 ; 卷积神经网络 ; 长短时记忆神经网络 ; 编码预测 ; 卷积门控循环单元
英文关键词：video frame prediction;;Convolutional Neural Network(CNN);;Long and Short-Term Memory(LSTM) neural network;;encoding prediction;;convolutional Gated Recurrent Unit(GRU)
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：江苏省大气环境与装备技术协同创新中心(南京信息工程大学);
出版日期：2019-04-10 07:00
出版单位：计算机应用
年：2019
期：v.39;No.346
基金：国家自然科学基金资助项目(61503192,61773219);; 江苏省自然科学基金资助项目(BK20161533);; 江苏省青蓝工程~~
语种：中文;
页：JSJY201906018
页数：6
CN：06
ISSN：51-1307/TP
分类号：107-112

摘要

针对视频帧预测中难以准确预测空间结构信息细节的问题,通过对卷积长短时记忆(LSTM)神经网络的改进,提出了一种深度卷积长短时神经网络的方法。首先,将输入序列图像输入到两个不同通道的深度卷积LSTM网络组成的编码网络中,由编码网络学习输入序列图像的位置信息变化特征和空间结构信息变化特征;然后,将学习到的变化特征输入到与编码网络通道数对应的解码网络中,由解码网络输出预测的下一张图;最后,将这张图输入回解码网络中,预测接下来的一张图,循环预先设定的次后输出全部的预测图。与卷积LSTM神经网络相比,在Moving-MNIST数据集上的实验中,相同训练步数下所提方法不仅保留了位置信息预测准确的特点,而且空间结构信息细节表征能力更强。同时,将卷积门控循环单元(GRU)神经网络的卷积层加深后,该方法在空间结构信息细节表征上也取得了提升,检验了该方法思想的通用性。
Concerning the difficulty in accurately predicting the spatial structure information details in video frame prediction, a method of deep convolutional Long Short Term Memory(LSTM) neural network was proposed by the improvement of the convolutional LSTM neural network. Firstly, the input sequence images were input into the coding network composed of two deep convolutional LSTM of different channels, and the position information change features and the spatial structure information change features of the input sequence images were learned by the coding network. Then, the learned change features were input into the decoding network corresponding to the coding network channel, and the next predicted picture was output by the decoding network. Finally, the picture was input back to the decoding network, and the next picture was predicted, and all the predicted pictures were output after the pre-set loop times. In the experiments on Moving-MNIST dataset, compared with the convolutional LSTM neural network, the proposed method preserved the accuracy of position information prediction, and had stronger spatial structure information detail representation ability with the same training steps. With the convolutional layer of the convolutional Gated Recurrent Unit(GRU) deepened, the method improved the details of the spatial structure information, verifying the versatility of the idea of the proposed method.

引文

[1]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.North Miami Beach,FL:Curran Associates Inc.,2012:1097-1105.
    [2]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2018-10-15].https://arxiv.org/pdf/1409.1556.pdf.
    [3]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2015:1-9.
    [4]RYOO M S.Human activity prediction:early recognition of ongoing activities from streaming videos[C]//Proceedings of the 2011IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2011:1036-1043.
    [5]ZHU S,JIA Y,PEI M.Parsing video events with goal inference and intent prediction[C]//Proceedings of the 2011 International Conference on Computer Vision.Piscataway,NJ:IEEE,2011:487-494.
    [6]VONDRICK C,PIRSIAVASH H,TORRALBA A.Anticipating visual representations from unlabeled video[C]//Proceedings of the2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2016:98-106.
    [7]KOOIJ J F P,SCHNEIDER N,FLOHR F,et al.Context-based pedestrian path prediction[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8694.Berlin:Springer,2014:618-633.
    [8]WALKER J,GUPTA A,HEBERT M.Dense optical flow prediction from a static image[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2015:2443-2451.
    [9]MOTTAGHI R,RASTEGARI M,GUPTA A,et al.“What happens if…”learning to predict the effect of forces in images[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9908.Berlin:Springer,2016:269-285.
    [10]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
    [11]ELMAN J L.Distributed representations,simple recurrent networks,and grammatical structure[J].Machine Learning,1991,7(2/3):195-225.
    [12]李洋,董红斌.基于CNN和Bi LSTM网络特征融合的文本情感分析[J].计算机应用,2018,38(11):3075-3080.(LI Y,DONG H B.Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network[J].Journal of Computer Applications,2018,38(11):3075-3080.)
    [13]姚煜,RYAD C.基于双向长短时记忆联结时序分类和加权有限状态转换器的端到端中文语音识别系统[J].计算机应用,2018,38(9):2495-2499.(YAO W,RYAD C.End-to-end Chinese speech recognition system based on bidirectional long-term memory-timed timing classification and weighted finite state converter[J].Journal of Computer Applications,2018,38(9):2495-2499.)
    [14]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 2014 Neural Information Processing Systems Conference.Cambridge,MA:MIT Press,2014:3104-3112.
    [15]BENGIO Y,SIMARD P,FRASCONI P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks,1994,5(2):157-166.
    [16]SHI X J,CHEN Z R,WANG H,et al.Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2015:802-810.
    [17]MOLLAHOSSEINI A,CHAN D,MAHOOR M H.Going deeper in facial expression recognition using deep neural networks[C]//Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision.Piscataway,NJ:IEEE,2016:1-10.
    [18]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning.Cambridge,MA:MIT Press,2015:448-486.
    [19]LESHNO M,LIN V Y,PINKUS A,et al.Original contribution:multilayer feedforward networks with a nonpolynomial activation function can approximate any function[J].Neural Networks,1991,6(6):861-867.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700