Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses

Nageswara Rao; Neena Imam; Zhengchun Liu; Rajkumar Kettimuthu; Ian Foster

doi:10.1007/978-3-030-45778-5_11

Conference Papers Year : 2020

Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses

(1) , (1) , (2) , (2) , (2)

1
2

Nageswara Rao

Function : Author
PersonId : 1102758

ORNL - Oak Ridge National Laboratory [Oak Ridge]

Neena Imam

Function : Author
PersonId : 1102759

ORNL - Oak Ridge National Laboratory [Oak Ridge]

Zhengchun Liu

Function : Author
PersonId : 1102760

ANL - Argonne National Laboratory [Lemont]

Rajkumar Kettimuthu

Function : Author
PersonId : 1102761

ANL - Argonne National Laboratory [Lemont]

Ian Foster

Function : Author
PersonId : 1102762

ANL - Argonne National Laboratory [Lemont]

Abstract

Scientific computations are expected to be increasingly distributed across wide-area networks, and Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. Application-level measurements of MPI operations reflect the connection Round-Trip Time (RTT) and loss rate, and machine learning methods have been previously developed to estimate them under deterministic periodic losses. In this paper, we consider more complex, random losses with uniform, Poisson and Gaussian distributions. We study five disparate machine leaning methods, with linear and non-linear, and smooth and non-smooth properties, to estimate RTT and loss rate over 10 Gbps connections with 0–366 ms RTT. The diversity and complexity of these estimators combined with the randomness of losses and TCP’s non-linear response together rule out the selection of a single best among them; instead, we fuse them to retain their design diversity. Overall, the results show that accurate estimates can be generated at low loss rates but become inaccurate at loss rates 10% and higher, thereby illustrating both their strengths and limitations.

Keywords

Domains

Fichier principal

487577_1_En_11_Chapter.pdf (1.87 Mo)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-03266451

Submitted on : Monday, June 21, 2021-5:31:07 PM

Last modification on : Monday, October 30, 2023-4:14:03 PM

Long-term archiving on : Wednesday, September 22, 2021-7:00:35 PM

Dates and versions

hal-03266451 , version 1 (21-06-2021)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-03266451 , version 1
DOI : 10.1007/978-3-030-45778-5_11

Cite

Nageswara Rao, Neena Imam, Zhengchun Liu, Rajkumar Kettimuthu, Ian Foster. Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses. 2nd International Conference on Machine Learning for Networking (MLN), Dec 2019, Paris, France. pp.154-174, ⟨10.1007/978-3-030-45778-5_11⟩. ⟨hal-03266451⟩

Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share