%0 Conference Proceedings %T Dynamic GMMU Bypass for Address Translation in Multi-GPU Systems %+ College of Computer Science [Changsha] %A Wei, Jinhui %A Lu, Jianzhuang %A Yu, Qi %A Li, Chen %A Zhao, Yunping %Z Part 3: Algorithm %< avec comité de lecture %( Lecture Notes in Computer Science %B 17th IFIP International Conference on Network and Parallel Computing (NPC) %C Zhengzhou, China %Y Xin He %Y En Shao %Y Guangming Tan %I Springer International Publishing %3 Network and Parallel Computing %V LNCS-12639 %P 147-158 %8 2020-09-28 %D 2020 %R 10.1007/978-3-030-79478-1_13 %K Multi-GPU system %K Memory virtualization %K Address translation architecture %Z Computer Science [cs]Conference papers %X The ever increasing application footprint raises challenges for GPUs. As Moore’s Law reaches its limit, it is not easy to improve single GPU performance any further; instead, multi-GPU systems have been shown to be a promising solution due to its GPU-level parallelism. Besides, memory virtualization in recent GPUs simplifies multi-GPU programming. Memory virtualization requires support for address translation, and the overhead of address translation has an important impact on the system’s performance. Currently, there are two common address translation architectures in multi-GPU systems, including distributed and centralized address translation architectures. We find that both architectures suffer from performance loss in certain cases. To address this issue, we propose GMMU Bypass, a technique that allows address translation requests to dynamically bypass GMMU in order to reduce translation overhead. Simulation results show that our technique outperforms distributed address translation architecture by 6% and centralized address translation architecture by 106% on average. %G English %Z TC 10 %Z WG 10.3 %2 https://inria.hal.science/hal-03768734/document %2 https://inria.hal.science/hal-03768734/file/511910_1_En_13_Chapter.pdf %L hal-03768734 %U https://inria.hal.science/hal-03768734 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-TC10 %~ IFIP-NPC %~ IFIP-WG10-3 %~ IFIP-LNCS-12639