卷积神经网络(CNN)算法的FPGA并行结构设计

英文篇名：FPGA Parallel Structure Design of Convolutional Neural Network (CNN) Algorithm
作者：王巍 ; 周凯利 ; 王伊昌 ; 王广 ; 杨正琳 ; 袁军
英文作者：WANG Wei;ZHOU Kai-li;WANG Yi-chang;WANG Guang;YANG Zheng-lin;YUAN Jun;College of Electronics Engineering/International Semiconductor College,Chongqing University of Posts and Telecommunications;
关键词：卷积神经网络 ; 现场可编程门阵列(FPGA) ; 并行结构 ; 流水线
英文关键词：convolution neural network;;field programmable gate array(FPGA);;parallel structure;;pipeline
中文刊名：WXYJ
英文刊名：Microelectronics & Computer
机构：重庆邮电大学光电工程学院/国际半导体学院;
出版日期：2019-04-05
出版单位：微电子学与计算机
年：2019
期：v.36;No.419
基金：国家自然科学基金(61404019);; 重庆市基础与前沿研究计划项目(cstc2016jcyjA0272)
语种：中文;
页：WXYJ201904012
页数：7
CN：04
ISSN：61-1123/TN
分类号：63-68+72

摘要

本文进行了CNN算法的FPGA并行结构设计.该设计首先利用CNN的并行计算特征以及循环变换方法,实现了可高效进行并行流水线的卷积计算电路,然后利用能够减少存储器访存时间的双缓存技术,在输入输出部分实现了缓存阵列,用于提高电路的计算性能(GOPS,每秒十亿次运算数).同时本文还对激活函数进行了优化设计,利用查找表和多项式结合的分段拟合方法设计了激活函数(sigmoid)的硬件电路,以保证近似的激活函数的硬件电路不会使精度下降.实验结果表明:输入时钟为150 MHz时,整体电路在计算性能上由15.87 GOPS提高到了20.62 GOPS,并在MNIST数据集上的识别率达到了98.81%.
In this paper, the FPGA parallel structure design of CNN algorithm is carried out. The design first uses the parallel computing features of CNN and the cyclic transformation method to realize a convolution calculation circuit that can efficiently perform parallel pipelines. Then, using the double-buffer technology that can reduce the memory access time, a cache array is implemented in the input and output sections to improve the computational performance of the circuit(GOPS, one billion operations per second). At the same time, the activation function is optimized. The hardware circuit of the activation function(sigmoid) is designed by using the segmentation fitting method of lookup table and polynomial to ensure that the hardware circuit of the approximate activation function will not reduce the accuracy. The experimental results show that when the input clock is 150 MHz, the overall performance of the circuit is improved from 15.87 GOPS to 20.62 GOPS, and the recognition rate on the MNIST data set reaches 98.81%.

引文

[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 2012:1097-1105.
    [2] GHAFFARI S, SHARIFIAN S. FPGA-based convolutional neural network accelerator design using high level synthesize[C]// International Conference of Signal Processing and Intelligent Systems. IEEE, 2017:1-6.
    [3] GOKHALE V, JIN J, DUNDAR A, et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]// Computer Vision and Pattern Recognition Workshops. IEEE, 2014:696-701.
    [4] CHEN Y H, KRISHNA T, EMER J, et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks[C]// IEEE International Solid-State Circuits Conference. IEEE, 2016:262-263.
    [5] FARABET C, POULET C, HAN J Y, et al. CNP: An FPGA-based processor for Convolutional Networks[C]// International Conference on Field Programmable Logic and Applications. IEEE, 2009:32-37.
    [6] FENG G, HU Z, CHEN S, et al. Energy-efficient and high-throughput FPGA-based accelerator for Convolutional Neural Networks[C]// IEEE International Conference on Solid-State and Integrated Circuit Technology. IEEE, 2017:624-626.
    [7] CHEN ZHANG, PENG LI, GUANGYU SUN, et al. Optimizing fpga-based accelerator design for deep convolutional neural networks[C]// in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015:161-170.
    [8] ZHOU Y, JIANG J. An FPGA-based accelerator implementation for deep convolutional neural networks[C]// International Conference on Computer Science and Network Technology. IEEE, 2016:829-832.
    [9] JIANG J, HU R, LUJAN M. A Flexible Memory Controller Supporting Deep Belief Networks with Fixed-Point Arithmetic[C]// Parallel and Distributed Processing Symposium Workshops & Phd Forum. IEEE, 2013:144-152.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700