用户名: 密码: 验证码:
面向阻变存储器的长短期记忆网络加速器的训练和软件仿真
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Training and Software Simulation for ReRAM-Based LSTM Neural Network Acceleration
  • 作者:刘鹤 ; 季宇 ; 韩建辉 ; 张悠慧 ; 郑纬民
  • 英文作者:Liu He;Ji Yu;Han Jianhui;Zhang Youhui;Zheng Weimin;Deparment of Computer Science and Technology, Tsinghua University;Institute of Microelectronics, Tsinghua University;
  • 关键词:阻变存储器 ; 长短期记忆网络 ; 训练算法 ; 仿真框架 ; 神经网络
  • 英文关键词:ReRAM;;long short-term memory(LSTM);;training algorithm;;simulation framework;;neural network
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:清华大学计算机科学与技术系;清华大学微电子学研究所;
  • 出版日期:2019-06-15
  • 出版单位:计算机研究与发展
  • 年:2019
  • 期:v.56
  • 基金:国防科技创新特区项目~~
  • 语种:中文;
  • 页:JFYZ201906007
  • 页数:10
  • CN:06
  • ISSN:11-1777/TP
  • 分类号:52-61
摘要
长短期记忆(long short-term memory, LSTM)网络是一种循环神经网络,其擅长处理和预测时间序列中间隔和延迟较长的事件,多用于语音识别、机器翻译等领域.然而受限于内存带宽的限制,现今的多数神经网络加速器件的计算模式并不能高效处理长短期记忆网络计算;而阻变存储器交叉开关结构能够以存内计算形式完成高效、高密度的向量矩阵乘运算,从而成为一种高效处理长短期记忆网络的极具潜力的加速器设计模式.研究了面向阻变存储器的长短期记忆神经网络加速器模拟工具以及相应的神经网络训练算法.该模拟工具能够以时钟驱动的形式模拟设计者提出的以阻变存储器交叉开关结构为核心加速部件的长短期记忆加速器微体系结构,从而进行设计空间探索;同时改进了神经网络训练算法以适应阻变存储器特性.这一模拟工具基于System-C实现,且对于核心计算部分实现了图形处理器加速,可以提高阻变存储器器件的仿真速度,为探索设计空间提供便利.
        Long short-term memory(LSTM) is mostly used in fields of speech recognition, machine translation, etc., owing to its expertise in processing and predicting events with long intervals and long delays in time series. However, most of existing neural network acceleration chips cannot perform LSTM computation efficiently, as limited by the low memory bandwidth. ReRAM-based crossbars, on the other hand, can process matrix-vector multiplication efficiently due to its characteristic of processing in memory(PIM). However, a software tool of broad architectural exploration and end-to-end evaluation for ReRAM-based LSTM acceleration is still missing. This paper proposes a simulator for ReRAM-based LSTM neural network acceleration and a corresponding training algorithm. Main features(including imperfections) of ReRAM devices and circuits are reflected by the highly configurable tools, and the core computation of simulation can be accelerated by general-purpose graphics processing unit(GPGPU). Moreover, the core component of simulator has been verified by the corresponding circuit simulation of a real chip design. Within this framework, architectural exploration and comprehensive end-to-end evaluation can be achieved.
引文
[1]Jouppi N P,Young C,Patil N,et al.In-datacenter performance analysis of a tensor processing unit[C] //Proc of the 44th Annual Int Symp on Computer Architecture.New York:ACM,2017:1- 12
    [2]Xcelerit.Benchmarks:Deep Learning Nvidia P100 vs V100 GPU[OL].[2017-11-27].https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-deep-learning-nvidia-p100-vs-v100-gpu/
    [3]Han Song,Kang Junlong,Mao Huizi,et al.ESE:Efficient speech recognition engine with sparse LSTM on FPGA[C]//Proc of the 2017 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays.New York:ACM,2017:75- 84
    [4]Wang Suo,Li Zhe,Ding Caiwen,et al.C-LSTM:Enabling efficient LSTM using structured compression techniques on FPGAs[C] //Proc of the 2018 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays.New York:ACM,2018:11- 20
    [5]Shafiee A,Nag A,Muralimanohar N,et al.ISAAC:A convolutional neural network accelerator with in-situ analog arithmetic in crossbars[C] //Proc of the 43rd Annual Int Symp on Computer Architecture.Piscataway,NJ:IEEE,2016:14- 26
    [6]Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735- 1780
    [7]Evangelopoulos G N.Efficient hardware mapping of long short-term memory neural networks for automatic speech recognition[D].Belgium:KU Leuven,2016
    [8]Liu Beiye,Li Hai,Chen Yiran,et al.Vortex:Variation-aware training for memristor x-bar[C]// Proc of the 52nd ACM/EDAC/IEEE Design Automation Conf (DAC).Piscataway,NJ:IEEE,2015:1- 6
    [9]Tang Tianqi,Xia Lixue,Li Boxun,et al.Binary convolutional neural network on RRAM[C] //Proc of the 22nd Asia and South Pacific Design Automation Conf.Piscataway,NJ:IEEE,2017:782- 787
    [10]Song Linghao,Qian Xuehai,Li Hai,et al.PipeLayer:A pipelined ReRAM-based accelerator for deep learning[C] //Proc of 2017 IEEE Int Symp on High Performance Computer Architecture.Piscataway,NJ:IEEE,2017:541- 552
    [11]Dong Xiangyu,Xu Cong,Xie Yuan,et al.NVSim:A circuit-level performance,energy,and area model for emerging nonvolatile memory[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2012,31(7):994- 1007
    [12]Xu Sheng,Chen Xiaoming,Wang Ying,et al.PIMSim:A flexible and detailed processing-in-memory simulator[J].IEEE Computer Architecture Letters,2018,18(1):6- 9
    [13]Chen Paiyu,Peng Xiaochen,Yu Shimeng.NeuroSim:A circuit-level macro model for benchmarking neuro-inspired architectures in online learning[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018,37(12):3067- 3080
    [14]Xia Lixue,Li Boxun,Tang Tianqi,et al.MNSIM:Simulation platform for memristor-based neuromorphic computing system[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018,37(5):1009- 1022
    [15]Yao Peng,Wu Huaqiang,Gao Bin,et al.Face classification using electronic synapses[J].Nature Communications,2017,8:15199
    [16]Muralimanohar N,Balasubramonian R,Jouppi N P.Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0[C] //Proc of the 40th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO-40 2007).Piscataway,NJ:IEEE,2007:3- 14
    [17]Long Yun,Na T,Mukhopadhyay S.ReRAM-based processing-in-memory architecture for recurrent neural network acceleration[J].IEEE Transactions on Very Large Scale Integration (VLSI) Systems,2018,26(12):2781- 2794
    [18]Gu Peng,Li Boxun,Tang Tianqi,et al.Technological exploration of RRAM crossbar array for matrix-vector multiplication[J].Journal of Computer Science and Technology,2016,31(1):3- 19
    [19]Lee S R,Kim Y B,Chang M,et al.Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory[C] //Proc of 2012 Symp on VLSI Technology.Piscataway,NJ:IEEE,2012:71- 72
    ① 与真实芯片的电路设计作对比得出该结论.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700