CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark
Abstract
To improve system performance, Spark places the RDDs into memory for further access through the caching mechanism. And it provides a variety of storage levels to put cache RDDs. However, the RDD-grained manual storage level selection mechanism can not adjust depending on computing resources of the node. In this paper, we firstly present a fine-grained automatic storage level selection mechanism. And then we provide a storage level for a partition based on a cost model which fully considering the system resources status, compression and serialization costs. Experiments show that our approach can offer a up to 77% performance improvement compared to the default storage level scheme provided by Spark.
Domains
Computer Science [cs]Origin | Files produced by the author(s) |
---|
Loading...