Failure Analysis and Modeling in Large Multi-site Infrastructures - Distributed Applications and Interoperable Systems
Conference Papers Year : 2013

Failure Analysis and Modeling in Large Multi-site Infrastructures

Abstract

Every large multi-site infrastructure such as Grids and Clouds must implement fault-tolerance mechanisms and smart schedulers to enable continuous operation even when resource failures occur. Evaluating the efficiency of such mechanisms and schedulers requires representative failure models that are able to capture realistic properties of real world failure data. This paper shows that failures in multi-site infrastructures are far from being randomly distributed. We propose a failure model that captures features observed in real failure traces.
Fichier principal
Vignette du fichier
978-3-642-38541-4_10_Chapter.pdf (359.49 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01489451 , version 1 (14-03-2017)

Licence

Identifiers

Cite

Tran Ngoc Minh, Guillaume Pierre. Failure Analysis and Modeling in Large Multi-site Infrastructures. 13th International Conference on Distributed Applications and Interoperable Systems (DAIS), Jun 2013, Florence, Italy. pp.127-140, ⟨10.1007/978-3-642-38541-4_10⟩. ⟨hal-01489451⟩
265 View
86 Download

Altmetric

Share

More