基于卷积神经网络图像识别算法的加速实现方法

英文篇名：Accelerated Implementation Method of Image Recognition Algorithm Based on Convolutional Neural Network
作者：秦东辉 ; 周辉 ; 赵雄波 ; 柳柱
英文作者：Qin Donghui;Zhou Hui;Zhao Xiongbo;Liu Zhu;Beijing Aerospace Automatic Control Institute;National Aerospace Intelligence Control Technology Laboratory;
关键词：卷积神经网络 ; FPGA ; 硬件加速 ; SDSoC
英文关键词：Convolutional neutral network;;Field programmable gate array;;Hardware acceleration;;SDSoC
中文刊名：HTKZ
英文刊名：Aerospace Control
机构：北京航天自动控制研究所;宇航智能控制技术国家级重点实验室;
出版日期：2019-02-15
出版单位：航天控制
年：2019
期：v.37;No.177
语种：中文;
页：HTKZ201901004
页数：6
CN：01
ISSN：11-1989/V
分类号：22-27

摘要

针对当前卷积神经网络算法日趋复杂,基于通用处理器的软件实现方案运算性能难以满足实际应用实时性要求,而基于GPU的实现方案则存在高能耗、无法应用于嵌入式系统等问题,本文提出了一种使用高层次综合(HLS)实现的基于FPGA卷积神经网络加速器设计方案,采用SDSoC开发环境,在实现所需性能的同时节省了大量开发时间,实验结果表明,在输入图像为64*64*3情况下,本文提出的软硬件协同设计方案识别速度达到1. 86ms,相比CPU实现方案的识别速度266ms,加速比可达143,节约了88倍功耗。
In view of the increasing complexity of current convolutional neural network algorithms,the computational performance of software implementation CPU-based is difficult to meet the real-time requirements of practical applications,while the GPU-based implementation schemes have high energy consumption and can not be applied to embedded systems. An FPGA-based convolutional neural network accelerator design is realized by using high-level synthesis( HLS) implementation,and the SDSoC development environment is used to reduce a lot of development time while the required performance is achieved. The experimental results show that the input image is scale of 64* 64* 3,and the recognition speed of the software and hardware co-design scheme proposed reaches 1. 86 ms and the acceleration ratio can reach 143 by saving 88 times power consumption,which is compared with the recognition speed of 266 ms of the CPU implementation scheme.

引文

[1] Le Cun Y. Boser,B. Denker,J. S. Henderson,D.Howard,R. E. Hubbard,W.&Jackel,L. D. Backpropagation Applied to Handwritten Zip Code Recognition[J]. Neural Computation,1989,1(4),541-551.
    [2] Zhang C,Li P,Sun G,et al. Optimizing Fpga-based Accelerator Design for Deep Convolutional Neural Networks[C]. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM,2015:161-170.
    [3] Krizhevsky A,Sutskever I,Hinton G E. Image Net Classification with Deep Convolutional Neural Networks[C]. International Conference on Neural Information Processing Systems. Curran Associates Inc. 2012.
    [4] K. He,X. Zhang,S. Ren,and J. Sun. Delving Deep into Rectifiers:Surpassing Human-level Performance on Imagenet Classification[C]. ICCV,2015.
    [5] R. Girshick,J. Donahue,T. Darrell,and J. Malik. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]. CVPR,2014.
    [6]吴将,朱志宇.基于FPGA实现的SIRF模块级流水线设计[J].航天控制,2014,32(4):19-23,36.(Wu Jiang,Zhu Zhiyu. The Module-Level Pipelining Design of SIRF Based on FPGA[J]. Aerospace Control. 2014,32(4):19-23,36.)
    [7] Qiu J,Wang J,Yao S,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neuralnetwork[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.ACM,2016:26-35.
    [8]余子健,马德,严晓浪,等.基于FPGA的卷积神经网络加速器[J].计算机工程,2017,43(1):109-114,119.(Yu Zijian,Ma De,Yan Xiaolong,et al. FPGABased Accelerator for Convolutional Neural Network[J]. Computer Engineering,2017,43(1):109-114,119.)
    [9] Fang Rui,Liu Jiahe,Xue Zhihui,et al. FPGA-based Design for Convolution Neural Network[J]. Computer Engineering and Applications,2015,51(8):32-36.
    [10] Wang Kun,Zhou Hua. System Design and Hardware Realization of Convolution Neural Network System in Deep Learning[J]. Application of Electronic Technique,2018,44(5):56-59.
    [11] Y. Liang,K. Rupnow,Y. Li,et al. Chen. High-level Synthesis:Productivity,Performance,and Software Constraints[C]. ECE,2012.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700