%0 Conference Proceedings %T NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures %+ China University of Petroleum Beijing (CUP) %+ National University of Defense Technology [China] %A Yu, Xiaosong %A Ma, Huihui %A Qu, Zhengyu %A Fang, Jianbin %A Liu, Weifeng %Z Part 4: Architecture and Hardware %< avec comité de lecture %( Lecture Notes in Computer Science %B 17th IFIP International Conference on Network and Parallel Computing (NPC) %C Zhengzhou, China %Y Xin He %Y En Shao %Y Guangming Tan %I Springer International Publishing %3 Network and Parallel Computing %V LNCS-12639 %P 231-242 %8 2020-09-28 %D 2020 %R 10.1007/978-3-030-79478-1_20 %K Sparse matrix-vector multiplication %K NUMA architecture %K Hypergraph partitioning %K Phytium 2000+ %Z Computer Science [cs]Conference papers %X As a fundamental operation, sparse matrix-vector multiplication (SpMV) plays a key role in solving a number of scientific and engineering problems. This paper presents a NUMA-Aware optimization technique for the SpMV operation on the Phytium 2000+ ARMv8-based 64-core processor. We first provide a performance evaluation of the NUMA architecture of the Phytium 2000+ processor, then reorder the input sparse matrix with hypergraph partitioning for better cache locality, and redesign the SpMV algorithm with NUMA tools. The experimental results on Phytium 2000+ show that our approach utilizes the bandwidth in a much more efficient way, and improves the performance of SpMV by an average speedup of 1.76x on Phytium 2000+. %G English %Z TC 10 %Z WG 10.3 %2 https://inria.hal.science/hal-03768726/document %2 https://inria.hal.science/hal-03768726/file/511910_1_En_20_Chapter.pdf %L hal-03768726 %U https://inria.hal.science/hal-03768726 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-TC10 %~ IFIP-NPC %~ IFIP-WG10-3 %~ IFIP-LNCS-12639