A DAG Refactor Based Automatic Execution Optimization Mechanism for Spark - Network and Parallel Computing Access content directly
Conference Papers Year : 2019

A DAG Refactor Based Automatic Execution Optimization Mechanism for Spark

Abstract

In today’s big data era, traditional disk-based MapReduce big data framework encountered bottlenecks due to its lower memory utilization and inefficient orchestration of complex tasks. With the advantage of fully use memory resources, Spark provides a lot of data manipulate operators and use DAG to express the dependences. Spark split entire job to multi-stage according to DAG and schedule them in a distributed execution environment, which better adapted to the new characteristic of big data processing. However, Spark didn’t consider the resource requirement of different operators and schedule them indiscriminately, which could cause load imbalances on different nodes in the cluster and cause some node become bottlenecks due to its extraordinary resource consumption. In the past, solve this problem need developers to have a lot of experience of Spark and write code sophisticated. In this paper, we proposed a DAG refactor based automatic execution optimization mechanism for Spark. The experimental results show that the DAG refactor mechanism can greatly improve Spark performance by up to 8.8X without misinterpretation of original program semantics.
Fichier principal
Vignette du fichier
486810_1_En_30_Chapter.pdf (443.06 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03770537 , version 1 (06-09-2022)

Licence

Attribution

Identifiers

Cite

Hang Zhao, Yu Rao, Donghua Li, Jie Tang, Shaoshan Liu. A DAG Refactor Based Automatic Execution Optimization Mechanism for Spark. 16th IFIP International Conference on Network and Parallel Computing (NPC), Aug 2019, Hohhot, China. pp.338-344, ⟨10.1007/978-3-030-30709-7_30⟩. ⟨hal-03770537⟩
10 View
79 Download

Altmetric

Share

Gmail Facebook X LinkedIn More