Hash-Based File Content Identification Using Distributed Systems

York Yannikos; Jonathan Schluessler; Martin Steinebach; Christian Winter; Kalman Graffi

doi:10.1007/978-3-642-41148-9_8

Conference Papers Year : 2013

Hash-Based File Content Identification Using Distributed Systems

(1) , (2) , (1) , (1) , (3)

1
2
3

York Yannikos

Function : Author

Fraunhofer SIT - Fraunhofer Institute for Secure Information Technology [Darmstadt]

Jonathan Schluessler

Function : Author

Vector Informatik [Stuttgart]

Martin Steinebach

Function : Author

Fraunhofer SIT - Fraunhofer Institute for Secure Information Technology [Darmstadt]

Christian Winter

Function : Author

Fraunhofer SIT - Fraunhofer Institute for Secure Information Technology [Darmstadt]

Kalman Graffi

Function : Author

Heinrich Heine Universität Düsseldorf = Heinrich Heine University [Düsseldorf]

Abstract

A major challenge in digital forensics is the handling of very large amounts of data. Since forensic investigators often have to analyze several terabytes of data in a single case, efficient and effective tools for automatic data identification and filtering are required. A common data identification technique is to match the cryptographic hashes of files with hashes stored in blacklists and whitelists in order to identify contraband and harmless content, respectively. However, blacklists and whitelists are never complete and they miss most of the files encountered in investigations. Also, cryptographic hash matching fails when file content is altered even very slightly. This paper analyzes several distributed systems for their ability to support file content identification. A framework is presented for automated file content identification that searches for file hashes and collects, aggregates and presents the search results. Experiments demonstrate that the framework can provide identifying information for 26% of the test files from their hashed content, helping reduce the workload of forensic investigators.

Keywords

Domains

Computer Science [cs]

Fichier principal

978-3-642-41148-9_8_Chapter.pdf (1.75 Mo)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01460625

Submitted on : Tuesday, February 7, 2017-5:26:33 PM

Last modification on : Monday, October 17, 2022-9:46:06 AM

Long-term archiving on : Monday, May 8, 2017-3:02:33 PM

Dates and versions

hal-01460625 , version 1 (07-02-2017)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01460625 , version 1
DOI : 10.1007/978-3-642-41148-9_8

Cite

York Yannikos, Jonathan Schluessler, Martin Steinebach, Christian Winter, Kalman Graffi. Hash-Based File Content Identification Using Distributed Systems. 9th International Conference on Digital Forensics (DF), Jan 2013, Orlando, FL, United States. pp.119-134, ⟨10.1007/978-3-642-41148-9_8⟩. ⟨hal-01460625⟩

Hash-Based File Content Identification Using Distributed Systems

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share