%0 Conference Proceedings %T Detecting malicious pdf documents using semi-supervised machine learning %+ Institute of Information Engineering [Beijing] (IIE) %+ The University of Hong Kong (HKU) %+ Deakin University, Burwood, Australia %A Jiang, Jianguo %A Song, Nan %A Yu, Min %A Chow, Kam-Pui %A Li, Gang %A Liu, Chao %A Huang, Weiqing %Z Part 3: Advanced Forensic Techniques %< avec comité de lecture %( IFIP Advances in Information and Communication Technology %B 17th IFIP International Conference on Digital Forensics (DigitalForensics) %C Virtual, China %Y Gilbert Peterson %Y Sujeet Shenoi %I Springer International Publishing %3 Advances in Digital Forensics XVII %V AICT-612 %P 135-155 %8 2021-02-01 %D 2021 %R 10.1007/978-3-030-88381-2_7 %K Malicious PDF documents %K machine learning %K semi-supervised learning %Z Computer Science [cs]Conference papers %X Portable Document Format (PDF) documents are often used as carriers of malicious code that launch attacks or steal personal information. Traditional manual and supervised-learning-based detection methods rely heavily on labeled samples of malicious documents. But this is problematic because very few labeled malicious samples are available in real-world scenarios.This chapter presents a semi-supervised machine learning method for detecting malicious PDF documents. It extracts structural features as well as statistical features based on entropy sequences using the wavelet energy spectrum. A random sub-sampling strategy is employed to train multiple sub-classifiers. Each classifier is independent, which enhances the generalization capability during detection. The semi-supervised learning method enables labeled as well as unlabeled samples to be used to classify malicious and benign PDF documents. Experimental results demonstrate that the method yields an accuracy of 94% despite using training data with just 11% labeled malicious samples. %G English %2 https://inria.hal.science/hal-03764374/document %2 https://inria.hal.science/hal-03764374/file/519603_1_En_7_Chapter.pdf %L hal-03764374 %U https://inria.hal.science/hal-03764374 %~ IFIP-LNCS %~ IFIP %~ IFIP-AICT %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC11 %~ IFIP-DF %~ IFIP-WG11-9 %~ IFIP-AICT-612