PDF Malware Detection Using Visualization and Machine Learning

Ching-Yuan Liu; Min-Yi Chiu; Qi-Xian Huang; Hung-Min Sun

doi:10.1007/978-3-030-81242-3_12

Conference Papers Year : 2021

PDF Malware Detection Using Visualization and Machine Learning

(1) , (1) , (1) , (1)

Ching-Yuan Liu

Function : Author
PersonId : 1135132

National Tsing Hua University [Hsinchu]

Min-Yi Chiu

Function : Author
PersonId : 1135133

National Tsing Hua University [Hsinchu]

Qi-Xian Huang

Function : Author
PersonId : 1135134

National Tsing Hua University [Hsinchu]

Hung-Min Sun

Function : Author
PersonId : 1135135

National Tsing Hua University [Hsinchu]

Abstract

Recently, as more and more disasters caused by malware have been reported worldwide, people started to pay more attention to malware detection to prevent malicious attacks in advance. According to the diversity of the software platforms that people use, the malware also varies pretty much, for example: Xcode Ghost on iOS apps, FakePlayer on Android apps, and WannaCrypt on PC. Moreover, most of the time people ignore the potential security threats around us while surfing the internet, processing files or even reading email. The Portable Document Format (PDF) file, one of the most commonly used file types in the world, can be used to store texts, images, multimedia contents, and even scripts. However, with the increasing popularity and demands of PDF files, only a small fraction of people know how easy it could be to conceal malware in normal PDF files. In this paper, we propose a novel technique combining Malware Visualization and Image Classification to detect PDF files and identify which ones might be malicious. By extracting data from PDF files and traversing each object within, we can obtain the holistic tree-like structure of PDF files. Furthermore, according to the signature of the objects in the files, we assign different colors obtained from SimHash to generate RGB images. Lastly, our proposed model trained by the VGG19 with CNN architecture achieved up to 0.973 accuracy and 0.975 F1-score to distinguish malicious PDF files, which is viable for personal, or enterprise-wide use and easy to implement.

Keywords

Malware detection PDF malware Malware visualization Machine learning

Domains

Computer Science [cs]

Fichier principal

513274_1_En_12_Chapter.pdf (885.8 Ko)

Origin	Files produced by the author(s)

Hal Ifip : Connect in order to contact the contributor

https://inria.hal.science/hal-03677029

Submitted on : Tuesday, May 24, 2022-2:23:23 PM

Last modification on : Tuesday, May 24, 2022-2:29:18 PM

Long-term archiving on : Tuesday, August 30, 2022-10:19:37 AM

Dates and versions

hal-03677029 , version 1 (24-05-2022)

Licence

Attribution

Identifiers

HAL Id : hal-03677029 , version 1
DOI : 10.1007/978-3-030-81242-3_12

Cite

Ching-Yuan Liu, Min-Yi Chiu, Qi-Xian Huang, Hung-Min Sun. PDF Malware Detection Using Visualization and Machine Learning. 35th IFIP Annual Conference on Data and Applications Security and Privacy (DBSec), Jul 2021, Calgary, AB, Canada. pp.209-220, ⟨10.1007/978-3-030-81242-3_12⟩. ⟨hal-03677029⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-WG IFIP-TC11 IFIP-WG11-3 IFIP-DBSEC IFIP-LNCS-12840

61 View

178 Download

PDF Malware Detection Using Visualization and Machine Learning

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share