%0 Conference Proceedings %T TPL: A Novel Analysis and Optimization Model for RDMA P2P Communication %+ University of Chinese Academy of Sciences [Beijing] (UCAS) %+ Institute of Computing Technology [Beijing] (ICT) %A Du, Zhen %A An, Zhongqi %A Xing, Jing %Z Part 8: Network %< avec comité de lecture %( Lecture Notes in Computer Science %B 17th IFIP International Conference on Network and Parallel Computing (NPC) %C Zhengzhou, China %Y Xin He %Y En Shao %Y Guangming Tan %I Springer International Publishing %3 Network and Parallel Computing %V LNCS-12639 %P 395-406 %8 2020-09-28 %D 2020 %R 10.1007/978-3-030-79478-1_34 %K RDMA %K Performance tuning %K Optimization model %Z Computer Science [cs]Conference papers %X With increasing demand for networks with high throughput and low latency, RDMA is widely used because of its high performance. Because optimization for RDMA can fully exploit the performance potential of RDMA, methods for RDMA optimization is very important. Existing mainstream researches design optimization methods by constructing a more complete hardware view and exploring relation between software implementation and specific hardware behavior. However, the hardware architecture of NIC (like InfiniBand) is a “black box”, which limits development of this type of optimization. So existing methods leave unsolvable problems. Besides, with development of RDMA technology, new features are proposed constantly. So, analysis and optimization methods of RDMA communication performance should be advancing with the times. The contributions of this paper are as follows: 1) We propose a new RDMA point-to-point communication performance analysis and optimization model: TPL. This model provides a more comprehensive perspective on RDMA optimization. 2) Guided by TPL, we design corresponding optimization algorithms for an existing problem, like WQE cache miss and a new scenario, like DCT. 3) We implement a new RDMA communication library, named ORCL, to put our optimizations together. ORCL eliminates WQE cache miss in real-time. And we simulate the workload of the in-memory KV system. Compared with existing RDMA communication implement, ORCL increases throughput by 95% and reduces latency by 10%. %G English %Z TC 10 %Z WG 10.3 %2 https://inria.hal.science/hal-03768727/document %2 https://inria.hal.science/hal-03768727/file/511910_1_En_34_Chapter.pdf %L hal-03768727 %U https://inria.hal.science/hal-03768727 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-TC10 %~ IFIP-NPC %~ IFIP-WG10-3 %~ IFIP-LNCS-12639