基于FPGA的卷积神经网络设计与实现

英文篇名：Design and Implementation of Convolutional Neural Network Based on FPGA
作者：蒋林 ; 王喜娟 ; 刘镇弢 ; 谢晓燕 ; 衡茜
英文作者：JIANG Lin;WANG Xi-juan;LIU Zhen-tao;XIE Xiao-yan;HENG Qian;College of Electronic Engineering,Xi'an University of Posts and Telecommunications;College of Computer Science,Xi'an University of Posts and Telecommunications;
关键词：卷积神经网络 ; 现场可编程门阵列 ; 阵列处理器 ; 并行性
英文关键词：convolutional neural network;;field-programmable gate array;;Array processor;;parallelism
中文刊名：WXYJ
英文刊名：Microelectronics & Computer
机构：西安邮电大学电子工程学院;西安邮电大学计算机学院;
出版日期：2018-08-05
出版单位：微电子学与计算机
年：2018
期：v.35;No.411
基金：国家自然科学基金项目(61772417,61602377,61634004,61272120);; 陕西省科技统筹创新工程项目(2016KTZDGY02-04-02);; 陕西省重点研发计划项目(2017GY-060)
语种：中文;
页：WXYJ201808028
页数：5
CN：08
ISSN：61-1123/TN
分类号：138-142

摘要

卷积神经网络(Convolutional Neural Network,CNN)在各种计算机视觉应用中取得了巨大成功.本文研究了卷积神经网络的并行结构,基于网络计算的多种并行特征,提出了CNN前向传播过程在FPGA并行计算的架构.实验结果表明,在110MHz的工作频率下,该结构可使FPGA的峰值运算速度达到0.48GOP/s,相较ARM Mali-T628GPU平台实现23.5倍的加速比.
Convolutional neural networks(CNN)have achieved great success in various computer vision applications.The parallel architecture of convolutional neural networks were studied in this paper.Based on the parallel characteristics of network computing,a parallel CNN forward propagation architecture was proposed.The experimental results showed that under the operating frequency of 110 MHz,the architecture could make the FPGA peak operating speed of 0.48 GOP/s,compared to the ARM Mali-T628 GPU platform to achieve 23.5×speed.

引文

[1]Li B,Zhou E,Huang B,et al.Large scale recurrent neural network on GPU[C]∥International Joint Conference on Neural Networks.IEEE,2014:4062-4069.
    [2]Li X,Zhang G,Huang H H,et al.Performance Analysis of GPU-Based Convolutional Neural Networks[C]∥International Conference on Parallel Processing.IEEE,2016:67-76.
    [3]李施豪,应三丛.基于FPGA的卷积神经网络浮点激励函数实现[J].微电子学与计算机,2017,34(10):105-109.
    [4]Liu Z,Dou Y,Jiang J,et al.Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks[J].Acm Transactions on Reconfigurable Technology&Systems,2017,10(3):17.
    [5]Shen Y,Ferdman M,Milder P.Overcoming resource underutilization in spatial CNN accelerators[C]∥International Conference on Field Programmable Logic and Applications.IEEE,2016:1-4.
    [6]Zhang C,Li P,Sun G,et al.Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]∥Acm/sigda Intermational Symposium on Field-Programmable Gate Arrays.ACM,2015:161-170.
    [7]Fursin G,Fursin G.Optimizing convolutional neural networks on embedded platforms with OpenCL[C]∥International Workshop on Opencl.ACM,2016:10.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700