基于深度强化学习的蜂窝网资源分配算法

英文篇名：Deep reinforcement learning based resource allocation algorithm in cellular networks
作者：廖晓闽 ; 严少虎 ; 石嘉 ; 谭震宇 ; 赵钟灵 ; 李赞
英文作者：LIAO Xiaomin;YAN Shaohu;SHI Jia;TAN Zhenyu;ZHAO Zhongling;LI Zan;State Key Laboratory of Integrated Services Networks, Xidian University;School of Information and Communications, National University of Defense Technology;The 29th Research Institute of China Electronics Technology Group Corporation;
关键词：蜂窝网 ; 资源分配 ; 深度强化学习 ; 神经网络
英文关键词：cellular networks;;resource allocation;;deep reinforcement learning;;neural network
中文刊名：TXXB
英文刊名：Journal on Communications
机构：西安电子科技大学综合业务网理论及关键技术国家重点实验室;国防科技大学信息通信学院;中国电子科技集团公司第二十九研究所;
出版日期：2019-02-25
出版单位：通信学报
年：2019
期：v.40;No.382
基金：国家自然科学基金重点资助项目(No.61631015)~~
语种：中文;
页：TXXB201902002
页数：8
CN：02
ISSN：11-2102/TN
分类号：15-22

摘要

针对蜂窝网资源分配多目标优化问题,提出了一种基于深度强化学习的蜂窝网资源分配算法。首先构建深度神经网络(DNN),优化蜂窝系统的传输速率,完成算法的前向传输过程;然后将能量效率作为奖惩值,采用Q-learning机制来构建误差函数,利用梯度下降法来训练DNN的权值,完成算法的反向训练过程。仿真结果表明,所提出的算法可以自主设置资源分配方案的偏重程度,收敛速度快,在传输速率和系统能耗的优化方面明显优于其他算法。
In order to solve multi-objective optimization problem, a resource allocation algorithm based on deep rein-forcement learning in cellular networks was proposed. Firstly, deep neural network(DNN) was built to optimize thetransmission rate of cellular system and to complete the forward transmission process of the algorithm. Then, theQ-learning mechanism was utilized to construct the error function, which used energy efficiency as the rewards. The gra-dient descent method was used to train the weights of DNN, and the reverse training process of the algorithm was com-pleted. The simulation results show that the proposed algorithm can determine optimization extent of optimal resourceallocation scheme with rapid convergence ability, it is obviously superior to the other algorithms in terms of transmissionrate and system energy consumption optimization.

引文

[1]HUANG J,YIN Y,ZHAO Y,et al.A game-theoretic resource allocation approach for intercell device-to-device communications in cellular networks[J].IEEE Transactions on Emerging Topics in Computing,2016,4(4):475-486.
    [2]WANG J,CHOU S.Secure strategy proof ascending-price spectrum auction[C]//IEEE Symposium on Privacy-Aware Computing.2017:96-106.
    [3]YANG T,ZHANG R,CHENG X,et al.Graph coloring based resource sharing scheme(GCRS)for D2D communications underlaying full-duplex cellular networks[J].IEEE Transactions on Vehicular Technology,2017,66(8):7506-7517.
    [4]TAKSHI H,DO?AN G,ARSLAN H.Joint optimization of device to device resource and power allocation based on genetic algorithm[J].IEEE Access,2018,6:21173-21183.
    [5]CHALLITA U,DONG L,SAAD W.Proactive resource management for ITE in unlicensed spectrum:a deep learning perspective[J].IEEE Transactions on Wireless Communications,2018,17(7):4674-4689.
    [6]LEE W.Resource allocation for multi-channel underlay cognitive radio network based on deep neural network[J].IEEE Communications Letters,2018,22(9):1942-1945.
    [7]LIU S,HU X,WANG W.Deep reinforcement learning based dynamic channel allocation algorithm in multibeam satellite systems[J].IEEEAccess,2018,6:15733-15742.
    [8]赵慧,张学,刘明,等.实现无线传输能量效率最大化的功率控制新方法[J].计算机应用,2013,33(2):365-368.ZHAO H,ZHANG X,LIU M,et al.New power control scheme with maximum energy efficiency in wireless transmission[J].Journal of Computer Application,2013,33(2):365-368.
    [9]GAO X Z,HAN H C,YANG K,et al.Energy efficiency optimization for D2D communications based on SCA and GP method[J].China Communications,2017,14(3):66-74.
    [10]SUTTON R S,BARTO A G.Reinforcement learning:an introduction[M].Massachusetts:MIT Press,2017.
    [11]焦李成,杨进,杨淑媛,等.深度学习、优化与识别[M].北京:清华大学出版社,2017.JIAO L C,ZHAO J,YANG S Y,et al.Deep learning,optimization and recognition[M].Beijing:Tsinghua University Press,2017.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700