Managing Terabyte-Scale Investigations with Similarity Digests
Abstract
The relentless increase in storage capacity and decrease in storage cost present an escalating challenge for digital forensic investigations – current forensic technologies are not designed to scale to the degree necessary to process the ever increasing volumes of digital evidence. This paper describes a similarity-digest-based approach that scales up the task of finding related digital artifacts in massive data sets. The results show that digests can be generated at rates exceeding those of cryptographic hashes on commodity multi-core computing systems. Also, the querying of the digest of a large (1 TB) target for the (trace) presence of a small file can be completed in less than one second with very high precision and recall rates.
Domains
Computer Science [cs]Origin | Files produced by the author(s) |
---|