Conference Papers Year : 2016

Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem

Paweł Szeszko
  • Function : Author
  • PersonId : 1023010
Magdalena Topczewska
  • Function : Author
  • PersonId : 1023011

Abstract

The article concerns the problem of imbalanced data classification, when classes, into which elements belong, are not equally represented. In the classification model building process cross-validation technique is one of the most popular to assess the efficacy of a classifier. While over-sampling methods are used to create new objects to obtain the balance between the number of objects in classes, inappropriate usage of the preprocessing moment has a direct impact on the achieved results. In most cases they are overestimated. To present and assess this phenomenon in this paper three preprocessing techniques (SMOTE, Safe-level SMOTE, SPIDER) and their modifications are used to make new elements of data sets to balance cardinalities of classes, and two classification methods (SVM, C4.5) are compared. k-folds cross-validation technique ($$k=10$$) considering two moments of preprocessing approaches is performed. The measures as precision, recall, F-measure and area under the ROC curve (AUC) are calculated and compared.
Fichier principal
Vignette du fichier
419526_1_En_17_Chapter.pdf (299.15 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01637457 , version 1 (17-11-2017)

Licence

Identifiers

Cite

Paweł Szeszko, Magdalena Topczewska. Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.183-194, ⟨10.1007/978-3-319-45378-1_17⟩. ⟨hal-01637457⟩
126 View
41 Download

Altmetric

Share

More