Sparse regression for large data sets with outliers

L. Bottmer, C. Croux, I. Wilms*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Web of Science)

Abstract

The linear regression model remains an important workhorse for data scientists. However, many data sets contain many more predictors than observations. Besides, outliers, or anomalies, frequently occur. This paper proposes an algorithm for regression analysis that addresses these features typical for big data sets, which we call “sparse shooting S”. The resulting regression coefficients are sparse, meaning that many of them are set to zero, hereby selecting the most relevant predictors. A distinct feature of the method is its robustness with respect to outliers in the cells of the data matrix. The excellent performance of this robust variable selection and prediction method is shown in a simulation study. A real data application on car fuel consumption demonstrates its usefulness.
Original languageEnglish
Pages (from-to)782-794
Number of pages13
JournalEuropean Journal of Operational Research
Volume297
Issue number2
DOIs
Publication statusPublished - 1 Mar 2022

Keywords

  • Data science
  • Lasso
  • Outliers
  • Robust regression
  • Variable selection
  • HIGH-DIMENSIONAL DATA
  • SELECTION
  • ROBUST
  • REGULARIZATION
  • SALES
  • INFORMATION
  • MODELS

Cite this