%0 Conference Proceedings %T DroidAutoML: A Microservice Architecture to Automate the Evaluation of Android Machine Learning Detection Systems %+ the World Is Distributed Exploring the tension between scale and coordination (WIDE) %A Bromberg, Yérom-David %A Gitzinger, Louison %Z Part 4: Machine Learning for Systems %< avec comité de lecture %( Lecture Notes in Computer Science %B DAIS - 20th IFIP International Conference on Distributed Applications and Interoperable Systems %C Valletta, Malta %Y Anne Remke %Y Valerio Schiavoni %I Springer International Publishing %3 Distributed Applications and Interoperable Systems %V LNCS-12135 %N 12135 %P 148-165 %8 2020-06-15 %D 2020 %R 10.1007/978-3-030-50323-9_10 %K Machine learning %K Android %K Malware %K AutoML %Z Computer Science [cs] %Z Computer Science [cs]/Networking and Internet Architecture [cs.NI]Conference papers %X The mobile ecosystem is witnessing an unprecedented increase in the number of malware in the wild. To fight this threat, actors from both research and industry are constantly innovating to bring concrete solutions to improve security and malware protection. Traditional solutions such as signature-based anti viruses have shown their limits in front of massive proliferation of new malware, which are most often only variants specifically designed to bypass signature-based detection. Accordingly, it paves the way to the emergence of new approaches based on Machine Learning (ML) technics to boost the detection of unknown malware variants. Unfortunately, these solutions are most often underexploited due to the time and resource costs required to adequately fine tune machine learning algorithms. In reality, in the Android community, state-of-the-art studies do not focus on model training, and most often go through an empirical study with a manual process to choose the learning strategy, and/or use default values as parameters to configure ML algorithms. However, in the ML domain, it is well known admitted that to solve efficiently a ML problem, the tunability of hyper-parameters is of the utmost importance. Nevertheless, as soon as the targeted ML problem involves a massive amount of data, there is a strong tension between feasibility of exploring all combinations and accuracy. This tension imposes to automate the search for optimal hyper-parameters applied to ML algorithms, that is not anymore possible to achieve manually. To this end, we propose a generic and scalable solution to automatically both configure and evaluate ML algorithms to efficiently detect Android malware detection systems. Our approach is based on devOps principles and a microservice architecture deployed over a set of nodes to scale and exhaustively test a large number of ML algorithms and hyper-parameters combinations. With our approach, we are able to systematically find the best fit to increase up to 11% the accuracy of two state-of-the-art Android malware detection systems. %G English %Z TC 6 %Z WG 6.1 %2 https://inria.hal.science/hal-03223251/document %2 https://inria.hal.science/hal-03223251/file/495624_1_En_10_Chapter.pdf %L hal-03223251 %U https://inria.hal.science/hal-03223251 %~ UNIV-RENNES1 %~ CNRS %~ INRIA %~ UNIV-UBS %~ INSA-RENNES %~ INRIA-RENNES %~ IRISA %~ IRISA_SET %~ INRIA_TEST %~ TESTALAIN1 %~ IFIP-LNCS %~ IFIP %~ CENTRALESUPELEC %~ INRIA2 %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC6 %~ IFIP-WG6-1 %~ IFIP-DAIS %~ UR1-HAL %~ UR1-MATH-STIC %~ UR1-UFR-ISTIC %~ TEST-UR-CSS %~ UNIV-RENNES %~ INRIA-RENGRE %~ UR1-MATH-NUM %~ IFIP-LNCS-12135