Multiple Imputation for Multilevel Data with Continuous and Binary Variables

Vincent Audigier; Ian R. White; Shahab Jolani; Thomas P. A. Debray; Matteo Quartagno; James Carpenter; Stef van Buuren; Matthieu Resche-Rigon

doi:10.1214/18-STS646

Multiple Imputation for Multilevel Data with Continuous and Binary Variables

Vincent Audigier^*, Ian R. White, Shahab Jolani, Thomas P. A. Debray, Matteo Quartagno, James Carpenter, Stef van Buuren, Matthieu Resche-Rigon

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

Original language	English
Pages (from-to)	160-183
Number of pages	24
Journal	Statistical Science
Volume	33
Issue number	2
DOIs	https://doi.org/10.1214/18-STS646
Publication status	Published - 1 May 2018

Keywords

Missing data
systematically missing values
multilevel data
mixed data
multiple imputation
joint modelling
fully conditional specification
FULLY CONDITIONAL SPECIFICATION
INDIVIDUAL PATIENT DATA
INTEGRATIVE DATA-ANALYSIS
MIXED-EFFECTS MODELS
MISSING-DATA
CHAINED EQUATIONS
MULTIVARIATE IMPUTATION
OBSERVATIONAL COHORT
RANDOMIZED-TRIALS
BAYESIAN-APPROACH

Access to Document

10.1214/18-STS646Licence: Free access - publisher

https://doi.org/10.1214/18-sts646

Cite this

@article{e32fa41fe1ad401b866a24705d12c76c,

title = "Multiple Imputation for Multilevel Data with Continuous and Binary Variables",

abstract = "We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.",

keywords = "Missing data, systematically missing values, multilevel data, mixed data, multiple imputation, joint modelling, fully conditional specification, FULLY CONDITIONAL SPECIFICATION, INDIVIDUAL PATIENT DATA, INTEGRATIVE DATA-ANALYSIS, MIXED-EFFECTS MODELS, MISSING-DATA, CHAINED EQUATIONS, MULTIVARIATE IMPUTATION, OBSERVATIONAL COHORT, RANDOMIZED-TRIALS, BAYESIAN-APPROACH",

author = "Vincent Audigier and White, {Ian R.} and Shahab Jolani and Debray, {Thomas P. A.} and Matteo Quartagno and James Carpenter and {van Buuren}, Stef and Matthieu Resche-Rigon",

year = "2018",

month = may,

day = "1",

doi = "10.1214/18-STS646",

language = "English",

volume = "33",

pages = "160--183",

journal = "Statistical Science",

issn = "0883-4237",

publisher = "Institute of Mathematical Statistics",

number = "2",

}

TY - JOUR

T1 - Multiple Imputation for Multilevel Data with Continuous and Binary Variables

AU - Audigier, Vincent

AU - White, Ian R.

AU - Jolani, Shahab

AU - Debray, Thomas P. A.

AU - Quartagno, Matteo

AU - Carpenter, James

AU - van Buuren, Stef

AU - Resche-Rigon, Matthieu

PY - 2018/5/1

Y1 - 2018/5/1

N2 - We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

AB - We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

KW - Missing data

KW - systematically missing values

KW - multilevel data

KW - mixed data

KW - multiple imputation

KW - joint modelling

KW - fully conditional specification

KW - FULLY CONDITIONAL SPECIFICATION

KW - INDIVIDUAL PATIENT DATA

KW - INTEGRATIVE DATA-ANALYSIS

KW - MIXED-EFFECTS MODELS

KW - MISSING-DATA

KW - CHAINED EQUATIONS

KW - MULTIVARIATE IMPUTATION

KW - OBSERVATIONAL COHORT

KW - RANDOMIZED-TRIALS

KW - BAYESIAN-APPROACH

U2 - 10.1214/18-STS646

DO - 10.1214/18-STS646

M3 - Article

SN - 0883-4237

VL - 33

SP - 160

EP - 183

JO - Statistical Science

JF - Statistical Science

IS - 2

ER -