摘要
卷积神经网络(Convolutional Neural Network,CNN)在各种计算机视觉应用中取得了巨大成功.本文研究了卷积神经网络的并行结构,基于网络计算的多种并行特征,提出了CNN前向传播过程在FPGA并行计算的架构.实验结果表明,在110MHz的工作频率下,该结构可使FPGA的峰值运算速度达到0.48GOP/s,相较ARM Mali-T628GPU平台实现23.5倍的加速比.
Convolutional neural networks(CNN)have achieved great success in various computer vision applications.The parallel architecture of convolutional neural networks were studied in this paper.Based on the parallel characteristics of network computing,a parallel CNN forward propagation architecture was proposed.The experimental results showed that under the operating frequency of 110 MHz,the architecture could make the FPGA peak operating speed of 0.48 GOP/s,compared to the ARM Mali-T628 GPU platform to achieve 23.5×speed.
引文
[1]Li B,Zhou E,Huang B,et al.Large scale recurrent neural network on GPU[C]∥International Joint Conference on Neural Networks.IEEE,2014:4062-4069.
[2]Li X,Zhang G,Huang H H,et al.Performance Analysis of GPU-Based Convolutional Neural Networks[C]∥International Conference on Parallel Processing.IEEE,2016:67-76.
[3]李施豪,应三丛.基于FPGA的卷积神经网络浮点激励函数实现[J].微电子学与计算机,2017,34(10):105-109.
[4]Liu Z,Dou Y,Jiang J,et al.Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks[J].Acm Transactions on Reconfigurable Technology&Systems,2017,10(3):17.
[5]Shen Y,Ferdman M,Milder P.Overcoming resource underutilization in spatial CNN accelerators[C]∥International Conference on Field Programmable Logic and Applications.IEEE,2016:1-4.
[6]Zhang C,Li P,Sun G,et al.Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]∥Acm/sigda Intermational Symposium on Field-Programmable Gate Arrays.ACM,2015:161-170.
[7]Fursin G,Fursin G.Optimizing convolutional neural networks on embedded platforms with OpenCL[C]∥International Workshop on Opencl.ACM,2016:10.