CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark - Network and Parallel Computing (NPC 2017)
Conference Papers Year : 2017

CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark

Bo Wang
Jie Tang
  • Function : Author
  • PersonId : 1027996

Abstract

To improve system performance, Spark places the RDDs into memory for further access through the caching mechanism. And it provides a variety of storage levels to put cache RDDs. However, the RDD-grained manual storage level selection mechanism can not adjust depending on computing resources of the node. In this paper, we firstly present a fine-grained automatic storage level selection mechanism. And then we provide a storage level for a partition based on a cost model which fully considering the system resources status, compression and serialization costs. Experiments show that our approach can offer a up to 77% performance improvement compared to the default storage level scheme provided by Spark.
Fichier principal
Vignette du fichier
457609_1_En_18_Chapter.pdf (333.21 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01705452 , version 1 (09-02-2018)

Licence

Identifiers

Cite

Bo Wang, Jie Tang, Rui Zhang, Zhimin Gu. CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark. 14th IFIP International Conference on Network and Parallel Computing (NPC), Oct 2017, Hefei, China. pp.150-154, ⟨10.1007/978-3-319-68210-5_18⟩. ⟨hal-01705452⟩
172 View
106 Download

Altmetric

Share

More