%0 Conference Proceedings
%T NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures
%+ China University of Petroleum Beijing (CUP)
%+ National University of Defense Technology [China]
%A Yu, Xiaosong
%A Ma, Huihui
%A Qu, Zhengyu
%A Fang, Jianbin
%A Liu, Weifeng
%Z Part 4: Architecture and Hardware
%< avec comité de lecture
%( Lecture Notes in Computer Science
%B 17th IFIP International Conference on Network and Parallel Computing (NPC)
%C Zhengzhou, China
%Y Xin He
%Y En Shao
%Y Guangming Tan
%I Springer International Publishing
%3 Network and Parallel Computing
%V LNCS-12639
%P 231-242
%8 2020-09-28
%D 2020
%R 10.1007/978-3-030-79478-1_20
%K Sparse matrix-vector multiplication
%K NUMA architecture
%K Hypergraph partitioning
%K Phytium 2000+
%Z Computer Science [cs]Conference papers
%X As a fundamental operation, sparse matrix-vector multiplication (SpMV) plays a key role in solving a number of scientific and engineering problems. This paper presents a NUMA-Aware optimization technique for the SpMV operation on the Phytium 2000+ ARMv8-based 64-core processor. We first provide a performance evaluation of the NUMA architecture of the Phytium 2000+ processor, then reorder the input sparse matrix with hypergraph partitioning for better cache locality, and redesign the SpMV algorithm with NUMA tools. The experimental results on Phytium 2000+ show that our approach utilizes the bandwidth in a much more efficient way, and improves the performance of SpMV by an average speedup of 1.76x on Phytium 2000+.
%G English
%Z TC 10
%Z WG 10.3
%2 https://inria.hal.science/hal-03768726/document
%2 https://inria.hal.science/hal-03768726/file/511910_1_En_20_Chapter.pdf
%L hal-03768726
%U https://inria.hal.science/hal-03768726
%~ IFIP-LNCS
%~ IFIP
%~ IFIP-TC
%~ IFIP-TC10
%~ IFIP-NPC
%~ IFIP-WG10-3
%~ IFIP-LNCS-12639