Data Type Classification: Hierarchical Class-to-Type Modeling
Abstract
Data and file type classification research conducted over the past ten to fifteen years has been dominated by competing experiments that only vary the number of classes, types of classes, machine learning technique and input vector. There has been surprisingly little innovation on fundamental approaches to data and file type classification. This chapter focuses on the empirical testing of a hypothesized, two-level hierarchical classification model and the empirical derivation and testing of several alternative classification models. Comparative evaluations are conducted on ten classification models to identify a final winning, two-level classification model consisting of five classes and 52 lower-level data and file types. Experimental results demonstrate that the approach leads to very good class-level classification performance, improved classification performance for data and file types without high entropy (e.g., compressed and encrypted data) and reasonably-equivalent classification performance for high-entropy data and file types.
Origin | Files produced by the author(s) |
---|
Loading...