%0 Conference Proceedings %T CSAS: Cost-Based Storage Auto-Selection, a Fine Grained Storage Selection Mechanism for Spark %+ Beijing Institute of Technology (BIT) %+ Southern University of Science and Technology (SUSTech) %A Wang, Bo %A Tang, Jie %A Zhang, Rui %A Gu, Zhimin %< avec comité de lecture %( Lecture Notes in Computer Science %B 14th IFIP International Conference on Network and Parallel Computing (NPC) %C Hefei, China %Y Xuanhua Shi %Y Hong An %Y Chao Wang %Y Mahmut Kandemir %Y Hai Jin %I Springer International Publishing %3 Network and Parallel Computing %V LNCS-10578 %P 150-154 %8 2017-10-20 %D 2017 %R 10.1007/978-3-319-68210-5_18 %K Big data %K Spark %K Storage level selection %K Optimize %Z Computer Science [cs]Conference papers %X To improve system performance, Spark places the RDDs into memory for further access through the caching mechanism. And it provides a variety of storage levels to put cache RDDs. However, the RDD-grained manual storage level selection mechanism can not adjust depending on computing resources of the node. In this paper, we firstly present a fine-grained automatic storage level selection mechanism. And then we provide a storage level for a partition based on a cost model which fully considering the system resources status, compression and serialization costs. Experiments show that our approach can offer a up to 77% performance improvement compared to the default storage level scheme provided by Spark. %G English %Z TC 10 %Z WG 10.3 %2 https://inria.hal.science/hal-01705452/document %2 https://inria.hal.science/hal-01705452/file/457609_1_En_18_Chapter.pdf %L hal-01705452 %U https://inria.hal.science/hal-01705452 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-TC10 %~ IFIP-NPC %~ IFIP-WG10-3 %~ IFIP-LNCS-10578