%0 Conference Proceedings %T Fast Content-Based File Type Identification %+ Information Security Institute %+ Ajou University %A Ahmed, Irfan %A Lhee, Kyung-Suk %A Shin, Hyun-Jung %A Hong, Man-Pyo %Z Part 2: FORENSIC TECHNIQUES %< avec comité de lecture %( IFIP Advances in Information and Communication Technology %B 7th Digital Forensics (DF) %C Orlando, FL, United States %Y Gilbert Peterson %Y Sujeet Shenoi %I Springer %3 Advances in Digital Forensics VII %V AICT-361 %P 65-75 %8 2011-01-31 %D 2011 %R 10.1007/978-3-642-24212-0_5 %K File type identification %K file content classification %K byte frequency %Z Computer Science [cs]Conference papers %X Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy. %G English %Z TC 11 %Z WG 11.9 %2 https://inria.hal.science/hal-01569553/document %2 https://inria.hal.science/hal-01569553/file/978-3-642-24212-0_5_Chapter.pdf %L hal-01569553 %U https://inria.hal.science/hal-01569553 %~ IFIP-LNCS %~ IFIP %~ IFIP-AICT %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC11 %~ IFIP-DF %~ IFIP-WG11-9 %~ IFIP-AICT-361