PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches

Daniel Stamate; Wajdi Alghamdi; Daniel Stahl; Doina Logofatu; Alexander Zamyatin

doi:10.1007/978-3-319-92007-8_24

Conference Papers Year : 2018

PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches

(1) , (1) , (2) , (3) , (4)

1
2
3
4

Daniel Stamate

Function : Author
PersonId : 1033545

University of London [London]

Wajdi Alghamdi

Function : Author
PersonId : 1033473

University of London [London]

Daniel Stahl

Function : Author

King‘s College London

Doina Logofatu

Function : Author

Frankfurt University of Applied Sciences

Alexander Zamyatin

Function : Author

National Research Tomsk State University

Abstract

In the process of constructing a decision tree, the criteria for selecting the splitting attributes influence the performance of the model produced by the decision tree algorithm. The most well-known criteria such as Shannon entropy and Gini index, suffer from the lack of adaptability to the datasets. This paper presents novel splitting attribute selection criteria based on some families of parameterised impurities that we proposed here to be used in the construction of optimal decision trees. These criteria rely on families of strict concave functions that define the new generalised parameterised impurity measures which we applied in devising and implementing our PIDT novel decision tree algorithm. This paper proposes also the S-condition based on statistical permutation tests, whose purpose is to ensure that the reduction in impurity, or gain, for the selected attribute is statistically significant. We implemented the S-pruning procedure based on the S-condition, to prevent model overfitting. These methods were evaluated on a number of simulated and benchmark datasets. Experimental results suggest that by tuning the parameters of the impurity measures and by using our S-pruning method, we obtain better decision tree classifiers with the PIDT algorithm.

Keywords

Domains

Computer Science [cs]

Fichier principal

467708_1_En_24_Chapter.pdf (635.22 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-01821078

Submitted on : Friday, June 22, 2018-11:46:18 AM

Last modification on : Tuesday, July 13, 2021-4:12:02 PM

Long-term archiving on : Tuesday, September 25, 2018-10:18:12 AM

Dates and versions

hal-01821078 , version 1 (22-06-2018)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-01821078 , version 1
DOI : 10.1007/978-3-319-92007-8_24

Cite

Daniel Stamate, Wajdi Alghamdi, Daniel Stahl, Doina Logofatu, Alexander Zamyatin. PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches. 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2018, Rhodes, Greece. pp.273-284, ⟨10.1007/978-3-319-92007-8_24⟩. ⟨hal-01821078⟩

PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share