PDF Malware Detection Using Visualization and Machine Learning - IFIP Open Digital Library
Conference Papers Year : 2021

PDF Malware Detection Using Visualization and Machine Learning

Ching-Yuan Liu
  • Function : Author
  • PersonId : 1135132
Min-Yi Chiu
  • Function : Author
  • PersonId : 1135133
Qi-Xian Huang
  • Function : Author
  • PersonId : 1135134
Hung-Min Sun
  • Function : Author
  • PersonId : 1135135

Abstract

Recently, as more and more disasters caused by malware have been reported worldwide, people started to pay more attention to malware detection to prevent malicious attacks in advance. According to the diversity of the software platforms that people use, the malware also varies pretty much, for example: Xcode Ghost on iOS apps, FakePlayer on Android apps, and WannaCrypt on PC. Moreover, most of the time people ignore the potential security threats around us while surfing the internet, processing files or even reading email. The Portable Document Format (PDF) file, one of the most commonly used file types in the world, can be used to store texts, images, multimedia contents, and even scripts. However, with the increasing popularity and demands of PDF files, only a small fraction of people know how easy it could be to conceal malware in normal PDF files. In this paper, we propose a novel technique combining Malware Visualization and Image Classification to detect PDF files and identify which ones might be malicious. By extracting data from PDF files and traversing each object within, we can obtain the holistic tree-like structure of PDF files. Furthermore, according to the signature of the objects in the files, we assign different colors obtained from SimHash to generate RGB images. Lastly, our proposed model trained by the VGG19 with CNN architecture achieved up to 0.973 accuracy and 0.975 F1-score to distinguish malicious PDF files, which is viable for personal, or enterprise-wide use and easy to implement.
Fichier principal
Vignette du fichier
513274_1_En_12_Chapter.pdf (885.8 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-03677029 , version 1 (24-05-2022)

Licence

Identifiers

Cite

Ching-Yuan Liu, Min-Yi Chiu, Qi-Xian Huang, Hung-Min Sun. PDF Malware Detection Using Visualization and Machine Learning. 35th IFIP Annual Conference on Data and Applications Security and Privacy (DBSec), Jul 2021, Calgary, AB, Canada. pp.209-220, ⟨10.1007/978-3-030-81242-3_12⟩. ⟨hal-03677029⟩
61 View
178 Download

Altmetric

Share

More