Promissing: Pruning Missing Values in Neural Networks

Seyed Mostafa Kia; Nastaran Mohammadian Rad; Daniel van Opstal; Bart van Schie; Andre F. Marquand; Josien Pluim; Wiepke Cahn; Hugo G. Schnack

Promissing: Pruning Missing Values in Neural Networks

Seyed Mostafa Kia, Nastaran Mohammadian Rad, Daniel van Opstal, Bart van Schie, Andre F. Marquand, Josien Pluim, Wiepke Cahn, Hugo G. Schnack

Research output: Working paper / Preprint › Preprint

27 Downloads (Pure)

Abstract

While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network models, are unable to handle these missing values directly. Therefore, extra data preprocessing and curation steps, such as data imputation, are inevitable before learning and prediction processes. In this study, we propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks. In this method, there is no need to remove or impute the missing values; instead, the missing values are treated as a new source of information (representing what we do not know). Our experiments on simulated data, several classification and regression benchmarks, and a multi-modal clinical dataset show that PROMISSING results in similar prediction performance compared to various imputation techniques. In addition, our experiments show models trained using PROMISSING techniques are becoming less decisive in their predictions when facing incomplete samples with many unknowns. This finding hopefully advances machine learning models from being pure predicting machines to more realistic thinkers that can also say "I do not know" when facing incomplete sources of information.

Original language	English
Publication status	Published - 3 Jun 2022
Externally published	Yes

Keywords

cs.LG
cs.AI
stat.ME

Access to Document

2206.01640v1

Cite this

@techreport{d92103c61b2144bab1311886ff530c49,

title = "Promissing: Pruning Missing Values in Neural Networks",

abstract = "While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network models, are unable to handle these missing values directly. Therefore, extra data preprocessing and curation steps, such as data imputation, are inevitable before learning and prediction processes. In this study, we propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks. In this method, there is no need to remove or impute the missing values; instead, the missing values are treated as a new source of information (representing what we do not know). Our experiments on simulated data, several classification and regression benchmarks, and a multi-modal clinical dataset show that PROMISSING results in similar prediction performance compared to various imputation techniques. In addition, our experiments show models trained using PROMISSING techniques are becoming less decisive in their predictions when facing incomplete samples with many unknowns. This finding hopefully advances machine learning models from being pure predicting machines to more realistic thinkers that can also say {"}I do not know{"} when facing incomplete sources of information.",

keywords = "cs.LG, cs.AI, stat.ME",

author = "Kia, {Seyed Mostafa} and Rad, {Nastaran Mohammadian} and Opstal, {Daniel van} and Schie, {Bart van} and Marquand, {Andre F.} and Josien Pluim and Wiepke Cahn and Schnack, {Hugo G.}",

year = "2022",

month = jun,

day = "3",

language = "English",

type = "WorkingPaper",

}

TY - UNPB

T1 - Promissing

T2 - Pruning Missing Values in Neural Networks

AU - Kia, Seyed Mostafa

AU - Rad, Nastaran Mohammadian

AU - Opstal, Daniel van

AU - Schie, Bart van

AU - Marquand, Andre F.

AU - Pluim, Josien

AU - Cahn, Wiepke

AU - Schnack, Hugo G.

PY - 2022/6/3

Y1 - 2022/6/3

N2 - While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network models, are unable to handle these missing values directly. Therefore, extra data preprocessing and curation steps, such as data imputation, are inevitable before learning and prediction processes. In this study, we propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks. In this method, there is no need to remove or impute the missing values; instead, the missing values are treated as a new source of information (representing what we do not know). Our experiments on simulated data, several classification and regression benchmarks, and a multi-modal clinical dataset show that PROMISSING results in similar prediction performance compared to various imputation techniques. In addition, our experiments show models trained using PROMISSING techniques are becoming less decisive in their predictions when facing incomplete samples with many unknowns. This finding hopefully advances machine learning models from being pure predicting machines to more realistic thinkers that can also say "I do not know" when facing incomplete sources of information.

AB - While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network models, are unable to handle these missing values directly. Therefore, extra data preprocessing and curation steps, such as data imputation, are inevitable before learning and prediction processes. In this study, we propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks. In this method, there is no need to remove or impute the missing values; instead, the missing values are treated as a new source of information (representing what we do not know). Our experiments on simulated data, several classification and regression benchmarks, and a multi-modal clinical dataset show that PROMISSING results in similar prediction performance compared to various imputation techniques. In addition, our experiments show models trained using PROMISSING techniques are becoming less decisive in their predictions when facing incomplete samples with many unknowns. This finding hopefully advances machine learning models from being pure predicting machines to more realistic thinkers that can also say "I do not know" when facing incomplete sources of information.

KW - cs.LG

KW - cs.AI

KW - stat.ME

M3 - Preprint

BT - Promissing

ER -