M2M通信中基于多智能体强化学习的无线资源分配算法

英文篇名：Multi-agent reinforcement learning based resource allocation for M2M communication
作者：徐少毅 ; 郑姗姗
英文作者：XU Shaoyi;ZHENG Shanshan;School of Electronic and Information Engineering,Beijing Jiaotong University;
关键词：无线通信 ; 机器对机器 ; 强化学习 ; 用户体验质量 ; 资源分配
英文关键词：wireless communication;;machine-to-machine;;reinforcement learning;;quality of experience;;resource allocation
中文刊名：BFJT
英文刊名：Journal of Beijing Jiaotong University
机构：北京交通大学电子信息工程学院;
出版日期：2018-10-15
出版单位：北京交通大学学报
年：2018
期：v.42;No.201
基金：国家自然科学基金(61571038,61471030);; 国家科技重大专项(2016ZX03001011-004);; 中央高校基本科研业务费专项资金(2016JBZ003)~~
语种：中文;
页：BFJT201805001
页数：9
CN：05
ISSN：11-5258/U
分类号：5-13

摘要

蜂窝网络因为其广覆盖、高可靠和支持高速移动等优点,是机器对机器(M2M)通信的理想载体.然而,由于机器类型通信设备具有的业务种类繁多和数量大等特点,现有的无线资源分配算法并不完全适用.为了解决这个问题,不同于传统的集中式资源分配算法,提出了基于多智能体强化学习的分布式无线资源分配算法.具有强化学习能力的机器类通信(MTC)设备可以自主选择资源块和功率等级,以达到在较低的功率消耗下得到较高的用户体验质量的目标.为了实现不同设备之间的协作,引入了多智能体强化学习,每个智能体有能力预测其他智能体的策略.仿真结果表明:本文算法在用户体验质量、功率消耗和计算复杂度方面达到了很好的性能.
Cellular network is an ideal carrier for Machine to Machine(M2 M)communication benefited from its characteristics such as broad coverage,high reliability and supporting for highspeed mobility.However,because of the various service types and huge amount of the Machine Type Communication(MTC)devices,the existing resource allocation algorithm is not fully applicable.In order to deal with this matter,in this paper,different from the traditional centralized algorithm,a distributed wireless resource allocation algorithm based on reinforcement learning is developed where the MTC devices with the ability of reinforcement learning can smartly choose resource blocks and power levels.The goal of them is to increase the Quality of Experience(QoE)with low power consumption.In order to achieve the collaboration between different devices,the multi-agent reinforcement learning algorithm is also introduced by which the MTC devices have the ability to predict the other devices' activities.The simulation results show that theproposed algorithm achieves good performance in terms of QoE,power consumption and computation complexity.

引文

[1]BAYAT S,LI Y,HAN Z,et al.Distributed massive wireless access for cellular machine-to-machine communication[C]//IEEE International Conference on Communications,2014:2767-2772.
    [2]FARHADI G,ITO A.Group-based signaling and access control for cellular machine-to-machine communication[C]//Vehicular Technology Conference,2014:1-6.
    [3]张月莹.基于QoE的无线资源管理算法研究[D].北京:北京邮电大学,2013.ZHANG Yueying.Research on wireless resource management algorithm for QoE[D].Beijing:Beijing University of Posts and Telecommunications,2013.(in Chinese)
    [4]ZHANG X,ZHANG J,HUANG Y,et al.On the study of fundamental trade-offs between QoE and energy efficiency in wireless networks[J].Transactions on Emerging Telecommunications Technologies,2013,24(3):259-265.
    [5]ASHERALIEVA A,MIYANAGA Y.An autonomous learning-based algorithm for joint channel and power level selection by D2D pairs in heterogeneous cellular networks[J].IEEE Transactions on Communications,2016(64):3996-4012.
    [6]NAGHAVI P,RASTEGAR S H,SHAH-MANSOURI V,et al.Learning RAT selection game in 5Gheterogeneous networks[J].IEEE Wireless Communications Letters,2016,5(1):52-55.
    [7]CHEN Z,LIN T,WU C.Decentralized learning-based relay assignment for cooperative communications[J].IEEE Transactions on Vehicular Technology,2016,65(2):813-826.
    [8]SUN H,NALLANATHAN A,WANG C X,et al.Wideband spectrum sensing for cognitive radio networks:a survey[J].IEEE Wireless Communications,2013,20(2):74-81.
    [9]SUTTON R S,BARTO A G.Reinforcement learning:an introduction[J].IEEE Transactions on Neural Networks,2005,16(1):285-286.
    [10]HU J,WELLMAN M P.Nash Q-learning for generalsum stochastic games[J].Journal of Machine Learning Research,2004,4(4):1039-1069.
    [11]JEAN-MARIE A,TIDBALL M.Adapting behaviors through a learning process[J].Journal of Economic Behavior&Organization,2006,60(3):399-422.
    [12]CHEN X,ZHAO Z,ZHANG H.Stochastic power adaptation with multiagent reinforcement learning for cognitive wireless mesh networks[J].IEEE Transactions on Mobile Computing,2013,12(11):2155-2166.
    [13]CAOY,DUAN D,CHENG X,et al.QoS-oriented wireless routing for smart meter data collection:stochastic learning on graph[J].IEEE Transactions on Wireless Communications,2014,13(8):4470-4482.
    [14]LITTMAN M L.A unified analysis of value-functionbased reinforcement learning algorithms[M].Cambridge:MIT Press,1999.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700