Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA

Yuran Qiao; Junzhong Shen; Dafei Huang; Qianming Yang; Mei Wen; Chunyuan Zhang

doi:10.1007/978-3-319-68210-5_9

Conference Papers Year : 2017

Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA

(1) , (1) , (1) , (1) , (1) , (1)

Yuran Qiao

Function : Author
PersonId : 1027984

National University of Defense Technology [China]

Junzhong Shen

Function : Author

National University of Defense Technology [China]

Dafei Huang

Function : Author

National University of Defense Technology [China]

Qianming Yang

Function : Author

National University of Defense Technology [China]

Mei Wen

Function : Author

National University of Defense Technology [China]

Chunyuan Zhang

Function : Author

National University of Defense Technology [China]

Abstract

Nowadays, the rapid growth of data across the Internet has provided sufficient labeled data to train deep structured artificial neural networks. While deeper structured networks bring about significant precision gains in many applications, they also pose an urgent demand for higher computation capacity at the expense of power consumption. To this end, various FPGA based deep neural network accelerators are proposed for higher performance and lower energy consumption. However, as a dilemma, the development cycle of FPGA application is much longer than that of CPU and GPU. Although FPGA vendors such as Altera and Xilinx have released OpenCL framework to ease the programming, tuning the OpenCL codes for desirable performance on FPGAs is still challenging. In this paper, we look into the OpenCL implementation of Convolutional Neural Network (CNN) on FPGA. By analysing the execution manners of a CPU/GPU oriented verision on FPGA, we find out the causes of performance difference between FPGA and CPU/GPU and locate the performance bottlenecks. According to our analysis, we put forward a corresponding optimization method focusing on external memory transfers. We implement a prototype system on an Altera Stratix V A7 FPGA, which brings a considerable 4.76$$\times $$ speed up to the original version. To the best of our knowledge, this implementation outperforms most of the previous OpenCL implementations on FPGA by a large margin.

Domains

Computer Science [cs]

Fichier principal

457609_1_En_9_Chapter.pdf (507.61 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01705448

Submitted on : Friday, February 9, 2018-2:26:40 PM

Last modification on : Tuesday, September 3, 2019-3:04:02 PM

Long-term archiving on : Friday, May 4, 2018-12:11:32 AM

Dates and versions

hal-01705448 , version 1 (09-02-2018)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01705448 , version 1
DOI : 10.1007/978-3-319-68210-5_9

Cite

Yuran Qiao, Junzhong Shen, Dafei Huang, Qianming Yang, Mei Wen, et al.. Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA. 14th IFIP International Conference on Network and Parallel Computing (NPC), Oct 2017, Hefei, China. pp.100-111, ⟨10.1007/978-3-319-68210-5_9⟩. ⟨hal-01705448⟩

Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA

Abstract

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share