%0 Conference Proceedings
%T Compiler-Assisted Operator Template Library for DNN Accelerators
%+ CAS Institute of Computing Technology (ICT)
%+ University of Chinese Academy of Sciences [Beijing] (UCAS)
%A Li, Jiansong
%A Cao, Wei
%A Dong, Xiao
%A Li, Guangli
%A Wang, Xueying
%A Liu, Lei
%A Feng, Xiaobing
%Z Part 1: Accelerator
%< avec comité de lecture
%( Lecture Notes in Computer Science
%B 17th IFIP International Conference on Network and Parallel Computing (NPC)
%C Zhengzhou, China
%Y Xin He
%Y En Shao
%Y Guangming Tan
%I Springer International Publishing
%3 Network and Parallel Computing
%V LNCS-12639
%P 3-16
%8 2020-09-28
%D 2020
%R 10.1007/978-3-030-79478-1_1
%K DNN accelerators
%K Template library
%K Address space management
%Z Computer Science [cs]Conference papers
%X Despite many dedicated accelerators are gaining popularity for their performance and energy efficiency in the deep neural network (DNN) domain, high-level programming support for these accelerators remains thin. In contrast to existing researches targeting the whole DNNs, we choose to dive into details and review this problem from a finer-grained level, operators. Due to performance concerns, operator programmers may have to take hand-written assembly as their first choice, which is error-prone and involves many programming chores. To alleviate this problem, we propose TOpLib, a compiler-assisted template library. By providing a unified user-view abstraction, TOpLib allows programmers to express computational kernels with high-level tensor primitives, which will be automatically lowered into low-level intrinsic primitives via expression templates. Moreover, considering memory management is performance critical and the optimization strategy of expression template is limited to enumeration based rewriting rules, we implement TOpLib with a compiler-assisted approach. We address the memory reuse challenges into the compiler, which allows TOpLib to make full use of on-chip buffers and result in better performance. Experiments over 55 typical DNN operators demonstrate that TOpLib can generate scalable code with performance faster than or on par with hand-written assembly versions.
%G English
%Z TC 10
%Z WG 10.3
%2 https://inria.hal.science/hal-03768760/document
%2 https://inria.hal.science/hal-03768760/file/511910_1_En_1_Chapter.pdf
%L hal-03768760
%U https://inria.hal.science/hal-03768760
%~ IFIP-LNCS
%~ IFIP
%~ IFIP-TC
%~ IFIP-TC10
%~ IFIP-NPC
%~ IFIP-WG10-3
%~ IFIP-LNCS-12639