A New Approach to Determine the Optimal Number of Clusters Based on the Gap Statistic

Jaekyung Yang; Jong-Yeong Lee; Myoungjin Choi; Yeongin Joo

doi:10.1007/978-3-030-45778-5_15

Conference Papers Year : 2020

A New Approach to Determine the Optimal Number of Clusters Based on the Gap Statistic

(1) , (1) , (2) , (1)

1
2

Jaekyung Yang

Function : Author
PersonId : 1102767

Chonbuk National University

Jong-Yeong Lee

Function : Author

Chonbuk National University

Myoungjin Choi

Function : Author

HU - Howon University

Yeongin Joo

Function : Author

Chonbuk National University

Abstract

Data clustering is one of the most important unsupervised classification method. It aims at organizing objects into groups (or clusters), in such a way that members in the same cluster are similar in some way and members belonging to different cluster are distinctive. Among other general clustering method, k-means is arguably the most popular one. However, it still has some inherent weaknesses. One of the biggest challenges when using k-means is to determine the optimal number of clusters, k. Although many approaches have been suggested in the literature, this is still considered as an unsolved problem. In this study, we propose a new technique to improve the gap statistic approach for selecting k. It has been tested on different datasets, on which it yields superior results compared to the original gap statistic. We expect our new method to also work well on other clustering algorithms where the number k is required. This is because our new approach, like the gap statistic, can work with any clustering method.

Keywords

Domains

Fichier principal

487577_1_En_15_Chapter.pdf (442.29 Ko)

Origin	Files produced by the author(s)
licence	CC BY 4.0 - Attribution

Connect in order to contact the contributor

https://inria.hal.science/hal-03266454

Submitted on : Monday, June 21, 2021-5:31:20 PM

Last modification on : Friday, July 30, 2021-2:50:13 PM

Long-term archiving on : Wednesday, September 22, 2021-7:01:06 PM

Dates and versions

hal-03266454 , version 1 (21-06-2021)

Licence

CC BY 4.0 - Attribution

Identifiers

HAL Id : hal-03266454 , version 1
DOI : 10.1007/978-3-030-45778-5_15

Cite

Jaekyung Yang, Jong-Yeong Lee, Myoungjin Choi, Yeongin Joo. A New Approach to Determine the Optimal Number of Clusters Based on the Gap Statistic. 2nd International Conference on Machine Learning for Networking (MLN), Dec 2019, Paris, France. pp.227-239, ⟨10.1007/978-3-030-45778-5_15⟩. ⟨hal-03266454⟩

A New Approach to Determine the Optimal Number of Clusters Based on the Gap Statistic

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share