%0 Conference Proceedings %T Distributed Monitoring and Management of Exascale Systems in the Argo Project %+ Argonne National Laboratory [Lemont] (ANL) %+ University of Chicago %+ Lawrence Livermore National Laboratory (LLNL) %A Perarnau, Swann %A Thakur, Rajeev %A Iskra, Kamil %A Raffenetti, Ken %A Cappello, Franck %A Gupta, Rinku %A Beckman, Pete %A Snir, Marc %A Hoffmann, Henry %A Schulz, Martin %A Rountree, Barry %< avec comité de lecture %( Lecture Notes in Computer Science %B 15th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS) %C Grenoble, France %Y Alysson Bessani %Y Sara Bouchenak %I Springer International Publishing %3 Distributed Applications and Interoperable Systems %V LNCS-9038 %P 173-178 %8 2015-06-02 %D 2015 %R 10.1007/978-3-319-19129-4_14 %K Power Budget %K Lawrence Livermore National Laboratory %K Logical Partition %K Argo System %K Message Broker %Z Computer Science [cs] %Z Computer Science [cs]/Networking and Internet Architecture [cs.NI]Conference papers %X New computing technologies are expected to change the highperformance computing landscape dramatically. Future exascale systems will comprise hundreds of thousands of compute nodes linked by complex networks-resources that need to be actively monitored and controlled, at a scale difficult to manage from a central point as in previous systems.In this context, we describe here on-going work in the Argo exascale software stack project to develop a distributed collection of services working together to track scientific applications across nodes, control the power budget of the system, and respond to eventual failures. Our solution leverages the idea of enclaves: a hierarchy of logical partitions of the system, representing groups of nodes sharing a common configuration, created to encapsulate user jobs as well as by the user inside its own job. These enclaves provide a second (and greater) level of control over portions of the system, can be tuned to manage specific scenarios, and have dedicated resources to do so. %G English %Z TC 6 %Z WG 6.1 %2 https://inria.hal.science/hal-01775026/document %2 https://inria.hal.science/hal-01775026/file/978-3-319-19129-4_14_Chapter.pdf %L hal-01775026 %U https://inria.hal.science/hal-01775026 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC6 %~ IFIP-WG6-1 %~ IFIP-DAIS %~ IFIP-DISCOTEC %~ IFIP-LNCS-9038