摘要
针对卷积神经网络中卷积运算复杂度高而导致计算时间过长的问题,本文提出了一种八级流水线结构的可配置CNN协加速器FPGA实现方法.通过在卷积运算控制器中嵌入池化采样控制器的复用手段使计算模块获得更多资源,利用mirror-tree结构来提高并行度,并采用Map算法来提高计算密度,同时加快了计算速度.实验结果表明,当精度为32位定点数/浮点数时,该实现方法的计算性能达到22.74GOPS.对比MAPLE加速器,计算密度提高283.3%,计算速度提高了224.9%,对比MCA(Memory-Centric Accelerator)加速器,计算密度提高了14.47%,计算速度提高了33.76%,当精度为8-16位定点数时,计算性能达到58.3GOPS,对比LBA(Layer-Based Accelerator)计算密度提高了8.5%.
To solve the problem that the time consumption of convolutional neural network is too much,which is mostly caused by the high complexity of convolution operation,an FPGA implementation of a configurable CNN co-accelerator with eight-stage pipeline structure is proposed.By embedding the pooling controller in the convolution controller,the computational module is able to obtain more resources.Specially,a mirror-tree structure is designed to increase parallelism.Furthermore,to increase computational density and speed up calculation at the same time,the Map algorithm is implemented in this design.The experimental results show that the computing performance of this implementation reaches 22.74 GOPS on 32-bit fixed/float point.Compared with MAPLE accelerator,the computational density is increased by 283.3%,and the calculation speed is boosted by 224.9%.Compared with MCA(Memory-Centric Accelerator), the computational density is increased by 14.47%,and the calculation speed is boosted by 33.76%.With a precision range between 8-bit and 16-bit fixed point,the performance reaches 58.3 GOPS,and the computational density is increased by 8.5% compared with LBA(Layer-Based Accelerator).
引文
[1] CFarabet,C Poulet,J Y Han,Y LeCun.CNP:An FPGA-based processor for convolutional networks[A].2009 International Conference on Field Programmable Logic and Applications[C].Prague:IEEE,2009.32-37.
[2] Ji S,Xu W,Yang M,Yu K.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[3] Larochelle H,Erhan D,Courville A,Bergstra J,Bengio Y.An empirical evaluation of deep architectures on problems with many factors of variation[A].Proceedings of the 24th International Conference on Machine Learning[C].New York:ACM,2007.473-480.
[4] Sankaradas M,et al.A massively parallel coprocessor for convolutional neural networks[A].The 20th IEEE International Conference on Application-Specific Systems,Architectures and Processors[C].Boston:IEEE,2009.53-60.
[5] Bengio Y,Courville A,Vincent P.Representation learning:a review and new perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828.
[6] Yongmei Zhou,Jingfei Jiang.An FPGA-based accelerator implementation for deep convolutional neural networks[A].The 4th International Conference on Computer Science and Network Technology (ICCSNT)[C].China:IEEE,2015.829-832.
[7] Chen Y,et al.DaDianNao:a machine-learning supercomputer[A].The 47th Annual IEEE/ACM International Symposium on Microarchitecture[C].Cambridge:IEEE,2014.609-622.
[8] Roux S,Mamlet F,Carcia C.Embedded convolutional face finder[A].2006 IEEE International Conference on Multimedia and Expo[C].Canada:IEEE,2006.285-288.
[9] Kamijo S,Matsushita Y,Ikeuchi K,et al.Traffic monitoring and accident detection at intersections [J].IEEE Transactions on Intelligent Transportation System,2000,1(2):108-118.
[10] Li N,Takaki S,Tomiokay Y,Kitazawa H.A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition[A].2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI)[C].USA:IEEE,2016.165-168.
[11] Cadambi S,Majumdar A,Becchi M,Chakradhar S,Graf H P.A programmable parallel accelerator for learning and classification[A].Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques[C].Austria:ACM,2010.273-284.
[12] Chakradhar S,Sankaradas M,Jakkula V,Cadambi S.A dynamically configurable coprocessor for convolutional neural networks[J].ACM SIGARCH Computer Architecture News,2010,38(3):247-257.
[13] Peemen M,Setio A A,Mesman B,Corporaal H.Memory-centric accelerator design for convolutional neural networks[A].IEEE 31st International Conference on Computer Design (ICCD) [C].USA:IEEE,2013.13-19.
[14] Alex Krizhevsky,Ilya Sutskever,Geoffrey E Hinton.ImageNet classification with deep convolutional neural network[J].Communications of the ACM,2017,60(6):84-90.
[15] Zhang C,Li P,Sun G,et al.Optimizing FPGA-based accelerator design for deep convolution neural networks[A].Proceedings of the 2015 ACM/SIGDA International Symposium on Field Programmable Gate Arrays[C].USA:ACM,2015.161-170.
[16] Huang C,Ni S,Chen G.A layer-based structured design of CNN on FPGA[A].IEEE 12th International Conference on ASIC (ASICON)[C].Guiyang:IEEE,2017.1037-1040.