%0 Conference Proceedings %T Data Fingerprinting with Similarity Digests %+ University of New Orleans %A Roussev, Vassil %< avec comité de lecture %( IFIP Advances in Information and Communication Technology %B 6th IFIP WG 11.9 International Conference on Digital Forensics (DF) %C Hong Kong, China %Y Kam-Pui Chow; Sujeet Shenoi %I Springer %3 Advances in Digital Forensics VI %V AICT-337 %P 207-226 %8 2010-01-04 %D 2010 %R 10.1007/978-3-642-15506-2_15 %K Data fingerprinting %K similarity digests %K fuzzy hashing %Z Computer Science [cs]/Digital Libraries [cs.DL]Conference papers %X State-of-the-art techniques for data fingerprinting have been based on randomized feature selection pioneered by Rabin in 1981. This paper proposes a new, statistical approach for selecting fingerprinting features. The approach relies on entropy estimates and a sizeable empirical study to pick out the features that are most likely to be unique to a data object and, therefore, least likely to trigger false positives. The paper also describes the implementation of a tool (sdhash) and the results of an evaluation study. The results demonstrate that the approach works consistently across different types of data, and its compact footprint allows for the digests of targets in excess of 1 TB to be queried in memory. %G English %2 https://inria.hal.science/hal-01060620/document %2 https://inria.hal.science/hal-01060620/file/Roussev10.pdf %L hal-01060620 %U https://inria.hal.science/hal-01060620 %~ IFIP-LNCS %~ IFIP %~ IFIP-AICT %~ IFIP-AICT-337 %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC11 %~ IFIP-DF %~ IFIP-WG11-9