Multi-source Distributed System Data for AI-Powered Analytics
Abstract
The emerging field of Artificial Intelligence for IT Operations (AIOps) utilizes monitoring data, big data platforms, and machine learning, to automate operations and maintenance (O&M) tasks in complex IT systems. The available research data usually contain only a single source of information, often logs or metrics. The inability of the single-source data to describe precise state of the distributed systems leads to methods that fail to make effective use of the joint information, thus, producing large number of false predictions. Therefore, current data limits the possibilities for greater advances in AIOps research. To overcome these constraints, we created a complex distributed system testbed, which generates multi-source data composed of distributed traces, application logs, and metrics. This paper provides detailed descriptions of the infrastructure, testbed, experiments, and statistics of the generated data. Furthermore, it identifies how such data can be utilized as a stepping stone for the development of novel methods for O&M tasks such as anomaly detection, root cause analysis, and remediation.The data from the testbed and its code is available at
https://zenodo.org/record/3549604
.
Origin | Files produced by the author(s) |
---|