%0 Conference Proceedings %T Adversarial Sampling Attacks Against Phishing Detection %+ Colorado State University [Fort Collins] (CSU) %+ Mahindra Ecole Centrale [Hyderabad] (MEC) %A Shirazi, Hossein %A Bezawada, Bruhadeshwar %A Ray, Indrakshi %A Anderson, Charles %Z Part 2: Mobile and Web Security %< avec comité de lecture %( Lecture Notes in Computer Science %B 33th IFIP Annual Conference on Data and Applications Security and Privacy (DBSec) %C Charleston, SC, United States %Y Simon N. Foley %I Springer International Publishing %3 Data and Applications Security and Privacy XXXIII %V LNCS-11559 %P 83-101 %8 2019-07-15 %D 2019 %R 10.1007/978-3-030-22479-0_5 %K Phishing %K Machine learning %K Adversarial sampling %K Classifiers %Z Computer Science [cs]Conference papers %X Phishing websites trick users into believing that they are interacting with a legitimate website, and thereby, capture sensitive information, such as user names, passwords, credit card numbers and other personal information. Machine learning appears to be a promising technique for distinguishing between phishing websites and legitimate ones. However, machine learning approaches are susceptible to adversarial learning techniques, which attempt to degrade the accuracy of a trained classifier model. In this work, we investigate the robustness of machine learning based phishing detection in the face of adversarial learning techniques. We propose a simple but effective approach to simulate attacks by generating adversarial samples through direct feature manipulation. We assume that the attacker has limited knowledge of the features, the learning models, and the datasets used for training. We conducted experiments on four publicly available datasets on the Internet. Our experiments reveal that the phishing detection mechanisms are vulnerable to adversarial learning techniques. Specifically, the identification rate for phishing websites dropped to 70% by manipulating a single feature. When four features were manipulated, the identification rate dropped to zero percent. This result means that, any phishing sample, which would have been detected correctly by a classifier model, can bypass the classifier by changing at most four feature values; a simple effort for an attacker for such a big reward. We define the concept of vulnerability level for each dataset that measures the number of features that can be manipulated and the cost for each manipulation. Such a metric will allow us to compare between multiple defense models. %G English %Z TC 11 %Z WG 11.3 %2 https://inria.hal.science/hal-02384598/document %2 https://inria.hal.science/hal-02384598/file/480962_1_En_5_Chapter.pdf %L hal-02384598 %U https://inria.hal.science/hal-02384598 %~ IFIP-LNCS %~ IFIP %~ IFIP-TC %~ IFIP-WG %~ IFIP-TC11 %~ IFIP-WG11-3 %~ IFIP-DBSEC %~ IFIP-LNCS-11559