摘要
考虑到深度学习在图像特征提取上的优势,为了提高深度学习在Atari游戏上的稳定性,在卷积神经网络和强化学习改进的Q-learning算法相结合的基础上,提出了一种基于模型融合的深度神经网络结构。实验表明,新的模型能够充分学习到控制策略,并且在Atari游戏上达到或者超出普通深度强化学习模型的得分,验证了模型融合的深度强化学习在视频游戏上的稳定性和优越性。
Considering the advantage of depth learning in image feature extraction,In order to improve the depth study on the Atari game performance this paper proposes a depth neural network structure based on model fusion,convolution neural network and modified Q-learning algorithm.Experiments show that the new model can fully study the control strategy,and it achieve or exceed the scores of the general learning model in the Atari game.Proving the deep reinforcement learning based on model fusion have the stability and superiority in the video game.
引文
[1]MNIHV,KAVUKCUOGLUK,SILVERD,etal..Human-levelcontrol through deep reinforc ement learning[J].Nature,2015,518(7540):529-533.
[2]SILVER D,HUANG A,MADDISON C,et al.Mastering the game of Go with deep neural,networks and tree search[J].Nature,2016,529(7587):484-489.
[3]赵冬斌,邵坤,朱圆恒,李栋,陈亚冉等.深度强化学习综述:兼论计算机围棋的发展[J].控制理论与应用,DOI:10.7641/CTA.2016.60173.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]//Proceedings of the NIPS Workshop on Deep Learning.Lake Tahoe:MIT Press,2013.
[5]WATKINS C J C H.Learning from delayed rewards[D].Cambridge:University of Cambridge,1989.
[6]Riedmiller M.Neural fitted Q iteration-first experiences with a data ecient neural reinforcement learning method[J].In:Proceedings of the 16th European Conference on Machine Learning.Porto,Portugal:Springer,2005.
[7]Marc G Bellemare,Yavar Naddaf,Joel Veness,and Michael Bowling.The arcade learning environment:An evaluation platform for general agents[J].Journal of Artificial Intelligence Research,47:253-279,2013.