基于注意力机制的行人轨迹预测生成模型

英文篇名：Attention mechanism based pedestrian trajectory prediction generation model
作者：孙亚圣 ; 姜奇 ; 胡洁 ; 戚进 ; 彭颖红
英文作者：SUN Yasheng;JIANG Qi;HU Jie;QI Jin;PENG Yinghong;School of Mechanical Engineering, Shanghai Jiao Tong University;School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University;
关键词：轨迹预测 ; 长短期记忆网络 ; 生成对抗网络 ; 注意力机制 ; 行人交互
英文关键词：trajectory prediction;;Long Short Term Memory(LSTM);;Generative Adversarial Network(GAN);;attention mechanism;;pedestrian interaction
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：上海交通大学机械与动力工程学院;上海交通大学电子信息与电气工程学院;
出版日期：2018-09-18 14:12
出版单位：计算机应用
年：2019
期：v.39;No.343
基金：国家自然科学基金资助项目(51775332,51675329,51675342);; 机械系统与振动国家重点实验室课题(GZ2016KF001,GKZD020018);; 特种车辆及其传动系统智能制造国家重点实验室开放课题(GZ2016KF001)~~
语种：中文;
页：JSJY201903010
页数：7
CN：03
ISSN：51-1307/TP
分类号：52-58

摘要

针对长短期记忆网络(LSTM)在行人轨迹预测问题中孤立考虑单个行人,且无法进行多种可能性预测的问题,提出基于注意力机制的行人轨迹预测生成模型(AttenGAN),来对行人交互模式进行建模和概率性地对多种合理可能性进行预测。AttenGAN包括一个生成器和一个判别器,生成器根据行人过去的轨迹概率性地对未来进行多种可能性预测,判别器用来判断一个轨迹是真实的还是由生成器伪造生成的,进而促进生成器生成符合社会规范的预测轨迹。生成器由一个编码器和一个解码器组成,在每一个时刻,编码器的LSTM综合注意力机制给出的其他行人的状态,将当前行人个体的信息编码为隐含状态。预测时,首先用编码器LSTM的隐含状态和一个高斯噪声连接来对解码器LSTM的隐含状态初始化,解码器LSTM将其解码为对未来的轨迹预测。在ETH和UCY数据集上的实验结果表明,AttenGAN模型不仅能够给出符合社会规范的多种合理的轨迹预测,并且在预测精度上相比传统的线性模型(Linear)、LSTM模型、社会长短期记忆网络模型(S-LSTM)和社会对抗网络(S-GAN)模型有所提高,尤其在行人交互密集的场景下具有较高的精度性能。对生成器多次采样得到的预测轨迹的可视化结果表明,所提模型具有综合行人交互模式,对未来进行联合性、多种可能性预测的能力。
Aiming at that Long Short Term Memory(LSTM) has only one pedestrian considered in isolation and cannot realize prediction with various possibilities, an attention mechanism based generative model for pedestrian trajectory prediction called AttenGAN was proposed to construct pedestrian interaction model and predict multiple reasonable possibilities. The proposed model was composed of a generator and a discriminator. The generator predicted multiple possible future trajectories according to pedestrian's past trajectory probability while the discriminator determined whether the trajectories were really existed or generated by the discriminator and gave feedback to the generator, making predicted trajectories obtained conform social norm more. The generator consisted of an encoder and a decoder. With other pedestrians information obtained by the attention mechanism as input, the encoder encoded the trajectories of the pedestrian as an implicit state. Combined with Gaussian noise, the implicit state of LSTM in the encoder was used to initialize the implicit state of LSTM in the decoder and the decoder decoded it into future trajectory prediction. The experiments on ETH and UCY datasets show that AttenGAN can provide multiple reasonable trajectory predictions and can predict the trajectory with higher accuracy compared with Linear, LSTM, S-LSTM(Social LSTM) and S-GAN(Social Generative Adversarial Network) models, especially in scenes of dense pedestrian interaction. Visualization of predicted trajectories obtained by the generator indicated the ability of this model to capture the interaction pattern of pedestrians and jointly predict multiple reasonable possibilities.

引文

[1]LARGE F,VASQUEZ D,FRAICHARD T,et al.Avoiding cars and pedestrians using velocity obstacles and motion prediction[EB/OL].[2018-07-01].https://www.researchgate.net/publication/29642615_Avoiding_Cars_and_Pedestrians_using_V-Obstacles_and_Motion_Prediction.
    [2]THOMPSON S,HORIUCHI T,KAGAMI S.A probabilistic model of human motion and navigation intent for mobile robot path planning[C]//Proceedings of the 2009 4th International Conference on Autonomous Robots and Agents.Piscataway,NJ:IEEE,2009:663-668.
    [3]BENNEWITZ M.Learning motion patterns of people for compliant robot motion[J].The International Journal of Robotics Research,2005,24(1):31-48.
    [4]HELBING D,MOLNR P.Social force model for pedestrian dynamics[J].Physical Review E:Statistical Physics Plasmas Fluids and Related Interdisciplinary Topics,1995,51(5):4282-4286.
    [5]TRAUTMAN P,KRAUSE A.Unfreezing the robot:navigation in dense,interacting crowds[C]//Proceedings of the 2010 IEEE/RSJInternational Conference on Intelligent Robots and Systems.Piscataway,NJ:IEEE,2010:797-803.
    [6]MORRIS B T,TRIVEDI M M.Trajectory learning for activity understanding:unsupervised,multilevel,and long-term adaptive approach[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(11):2287-2301.
    [7]KITANI K M,ZIEBART B D,BAGNELL J A,et al.Activity forecasting[C]//Proceedings of the 2012 European Conference on Computer Vision,LNCS 7575.Berlin:Springer,2012:201-214.
    [8]ALAHI A,GOEL K,RAMANATHAN V,et al.Social LSTM:human trajectory prediction in crowded spaces[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington,DC:IEEE Computer Society,2016:961-971.
    [9]VEMULA A,MUELLING K,OH J.Social attention:modeling attention in human crowds[EB/OL].[2018-03-25].https://arxiv.org/pdf/1710.04689.pdf.
    [10]GUPTA A,JOHNSON J,LI F-F,et al.Social GAN:socially acceptable trajectories with generative adversarial networks[EB/OL].[2018-05-04].https://arxiv.org/abs/1803.10892.pdf.
    [11]MNIH V,HEESS N,GRAVES A,et al.Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2014,2:2204-2212.
    [12]CHEN H,SUN M,TU C,et al.Neural sentiment classification with user and product attention[C]//Proceedings of the 2016Conference on Empirical Methods in Natural Language Processing.Austin,Texas:[s.n.],2016:1650-1659.
    [13]卢玲,杨武,王远伦,等.结合注意力机制的长文本分类方法[J].计算机应用,2018,38(5):1272-1277.(LU L,YANG W,WANG Y L,et al.Long text classification combined with attention mechanism[J].Journal of Computer Applications,2018,38(5):1272-1277.)
    [14]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
    [15]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[M]//GRAVES A.Supervised Sequence Labelling with Recurrent Neural Networks.Berlin:Springer,2012:37-45.
    [16]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[EB/OL].[2018-07-01].https://arxiv.org/pdf/1502.03044v2.pdf.
    [17]FAN H,SU H,GUIBAS L.A point set generation network for 3Dobject reconstruction from a single image[C]//Proceedings of the2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington,DC:IEEE Computer Society,2017:2463-2471.
    [18]PELLEGRINI S,ESS A,van GOOL L.Improving data association by joint modeling of pedestrian trajectories and groupings[C]//Proceedings of the 2010 European Conference on Computer Vision,LNCS 6311.Berlin:Springer,2010:452-465.
    [19]LERNER A,CHRYSANTHOU Y,LISCHINSKI D.Crowds by example[J].Computer Graphics Forum,2007,26(3):655-664.
    [20]LEE N,CHOI W,VERNAZA P,et al.DESIRE:Distant future prediction in dynamic scenes with interacting Agents[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington,DC:IEEE Computer Society,2017:2165-2174.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700