%0 Conference Proceedings %T DDFlasks: Deduplicated Very Large Scale Data Store %+ Universidade do Minho = University of Minho [Braga] %A Maia, Francisco %A Paulo, João %A Coelho, Fábio %A Neves, Francisco %A Pereira, José %A Oliveira, Rui %Z Part 2: Storing Data Smartly (Data storage) %< avec comité de lecture %( Lecture Notes in Computer Science %B 17th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS) %C Neuchâtel, Switzerland %Y Lydia Y. Chen %Y Hans Reiser %I Springer International Publishing %3 Distributed Applications and Interoperable Systems %V LNCS-10320 %P 51-66 %8 2017-06-19 %D 2017 %R 10.1007/978-3-319-59665-5_4 %Z Computer Science [cs] %Z Computer Science [cs]/Networking and Internet Architecture [cs.NI]Conference papers %X With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an epidemic data store for massive scale systems. Ensuring resiliency in this data store comes with a significant cost in storage resources and network bandwidth consumption. Deduplication has proven to be an efficient technique to reduce both costs but, applying it to a large-scale distributed storage system is not a trivial task. In fact, achieving significant space-savings without compromising the resiliency and decentralized design of these storage systems is a relevant research challenge.In this paper, we extend DataFlasks with deduplication to design DDFlasks. This system is evaluated in a real world scenario using Wikipedia snapshots, and the results are twofold. We show that deduplication is able to decrease storage consumption up to 63% and decrease network bandwidth consumption by up to 20%, while maintaining a fully-decentralized and resilient design. %G English %Z TC 6 %Z WG 6.1 %2 https://inria.hal.science/hal-01800122/document %2 https://inria.hal.science/hal-01800122/file/450046_1_En_4_Chapter.pdf %L hal-01800122 %U https://inria.hal.science/hal-01800122 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC6 %~ IFIP-WG6-1 %~ IFIP-DAIS %~ IFIP-DISCOTEC %~ IFIP-LNCS-10320