Regularized K-means through hard-thresholding

Jakob Raymaekers*, Ruben H. Zamar

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

We study a framework for performing regularized K-means, based on direct penalization of the size of the cluster centers. Different penalization strategies are considered and compared in a theoretical analysis and an extensive Monte Carlo simulation study. Based on the results, we propose a new method called hard-threshold K-means (HTK-means), which uses an ℓ0 penalty to induce sparsity. HTK-means is a fast and competitive sparse clustering method which is easily interpretable, as is illustrated on several real data examples. In this context, new graphical displays are presented and used to gain further insight into the data sets.
Original languageEnglish
Pages (from-to)1-48
Number of pages48
JournalJournal of Machine Learning Research
Volume23
Issue number93
Publication statusPublished - 1 Apr 2022

Keywords

  • clustering
  • penalized
  • variable selection
  • t0
  • VARIABLE SELECTION
  • DIVERGING NUMBER
  • CLUSTERS
  • LASSO

Cite this