Self-Balancing Job Parallelism and Throughput in Hadoop

In Hadoop cluster, the performance and the resource consumption of MapReduce jobs do not only depend on the characteristics of these applications and workloads, but also on the appropriate setting of Hadoop configuration parameters. However, when the job workloads are not known a priori or they evolve over time, a static configuration may quickly lead to a waste of computing resources and consequently to a performance degradation. In this paper, we therefore propose an on-line approach that dynamically reconfigures Hadoop at runtime. Concretely, we focus on balancing the job parallelism and throughput by adjusting Hadoop capacity scheduler memory configuration. Our evaluation shows that the approach outperforms vanilla Hadoop deployments by up to 40% and the best statically profiled configurations by up to 13%.

Keywords

Domains

Fichier principal

zhang-dais16.pdf (629.22 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01294834

Submitted on : Tuesday, June 14, 2016-9:19:19 AM

Last modification on : Monday, October 27, 2025-9:34:02 AM

Long-term archiving on : Thursday, September 15, 2016-10:23:57 AM

Dates and versions

hal-01294834 , version 1 (14-06-2016)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01294834 , version 1
DOI : 10.1007/978-3-319-39577-7_11

Cite

Bo Zhang, Filip Křikava, Romain Rouvoy, Lionel Seinturier. Self-Balancing Job Parallelism and Throughput in Hadoop. 16th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2016, Heraklion, Crete, Greece. pp.129-143, ⟨10.1007/978-3-319-39577-7_11⟩. ⟨hal-01294834⟩