Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

S.M.J. van Kuijk; W. Viechtbauer; Ludovicus Peeters; L. Smits

doi:10.2427/11598

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

S.M.J. van Kuijk^*, W. Viechtbauer, Ludovicus Peeters, L. Smits

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

74 Downloads (Pure)

Abstract

Background: The purpose of this simulation study is to compare bias in the estimation of regression coefficients between multiple imputation (MI) and complete case (CC) analysis when assumptions of missing data mechanisms are violated.

Methods: The authors performed a stochastic simulation study in which data were drawn from a multivariate normal distribution, and missing values were created according to different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). Data were analysed with a linear regression model using CC analysis, and after MI. In addition, characteristics of the data (i.e. correlation, size of the regression coefficients, error variance, proportion of missing data) were varied to assess the influence on the size and sign of bias. n Y, CC analysis resulted in severely biased regression coefficients; the Results: When data were MAR conditional oy were consistently underestimated in our scenarios. In the same scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis did not result in biased estimates, contrary to expectation.

Conclusion: The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR, with respect to bias and precision. In some scenarios CC may be superior to MI. Often it is not feasible to identify the cause of incomplete data in a given dataset. Therefore, emphasis should be placed on reporting the extent of missing values, the method that was used to address the problem, and the assumptions that were made about the mechanism that caused missing data.

Original language	English
Article number	e11598
Pages (from-to)	e11598-1-e11598-8
Number of pages	8
Journal	Epidemiology, Biostatistics and Public Health
Volume	13
Issue number	1
DOIs	https://doi.org/10.2427/11598
Publication status	Published - 1 Jan 2016

Keywords

multiple imputation
complete case analysis
missing data
bias
regression
MULTIPLE IMPUTATION
VALUES

Access to Document

10.2427/11598Licence: CC BY

Full TextFinal published version, 385 KBLicence: Taverne

Cite this

@article{366df5f4ec42461295590b31e86d8641,

title = "Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study",

abstract = "Background: The purpose of this simulation study is to compare bias in the estimation of regression coefficients between multiple imputation (MI) and complete case (CC) analysis when assumptions of missing data mechanisms are violated.Methods: The authors performed a stochastic simulation study in which data were drawn from a multivariate normal distribution, and missing values were created according to different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). Data were analysed with a linear regression model using CC analysis, and after MI. In addition, characteristics of the data (i.e. correlation, size of the regression coefficients, error variance, proportion of missing data) were varied to assess the influence on the size and sign of bias. n Y, CC analysis resulted in severely biased regression coefficients; the Results: When data were MAR conditional oy were consistently underestimated in our scenarios. In the same scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis did not result in biased estimates, contrary to expectation.Conclusion: The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR, with respect to bias and precision. In some scenarios CC may be superior to MI. Often it is not feasible to identify the cause of incomplete data in a given dataset. Therefore, emphasis should be placed on reporting the extent of missing values, the method that was used to address the problem, and the assumptions that were made about the mechanism that caused missing data.",

keywords = "multiple imputation, complete case analysis, missing data, bias, regression, MULTIPLE IMPUTATION, VALUES",

author = "{van Kuijk}, S.M.J. and W. Viechtbauer and Ludovicus Peeters and L. Smits",

year = "2016",

month = jan,

day = "1",

doi = "10.2427/11598",

language = "English",

volume = "13",

pages = "e11598--1--e11598--8",

journal = "Epidemiology, Biostatistics and Public Health",

issn = "2282-0930",

publisher = "Prex",

number = "1",

}

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study. / van Kuijk, S.M.J.; Viechtbauer, W.; Peeters, Ludovicus et al.
In: Epidemiology, Biostatistics and Public Health, Vol. 13, No. 1, e11598, 01.01.2016, p. e11598-1-e11598-8.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

AU - van Kuijk, S.M.J.

AU - Viechtbauer, W.

AU - Peeters, Ludovicus

AU - Smits, L.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Background: The purpose of this simulation study is to compare bias in the estimation of regression coefficients between multiple imputation (MI) and complete case (CC) analysis when assumptions of missing data mechanisms are violated.Methods: The authors performed a stochastic simulation study in which data were drawn from a multivariate normal distribution, and missing values were created according to different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). Data were analysed with a linear regression model using CC analysis, and after MI. In addition, characteristics of the data (i.e. correlation, size of the regression coefficients, error variance, proportion of missing data) were varied to assess the influence on the size and sign of bias. n Y, CC analysis resulted in severely biased regression coefficients; the Results: When data were MAR conditional oy were consistently underestimated in our scenarios. In the same scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis did not result in biased estimates, contrary to expectation.Conclusion: The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR, with respect to bias and precision. In some scenarios CC may be superior to MI. Often it is not feasible to identify the cause of incomplete data in a given dataset. Therefore, emphasis should be placed on reporting the extent of missing values, the method that was used to address the problem, and the assumptions that were made about the mechanism that caused missing data.

AB - Background: The purpose of this simulation study is to compare bias in the estimation of regression coefficients between multiple imputation (MI) and complete case (CC) analysis when assumptions of missing data mechanisms are violated.Methods: The authors performed a stochastic simulation study in which data were drawn from a multivariate normal distribution, and missing values were created according to different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). Data were analysed with a linear regression model using CC analysis, and after MI. In addition, characteristics of the data (i.e. correlation, size of the regression coefficients, error variance, proportion of missing data) were varied to assess the influence on the size and sign of bias. n Y, CC analysis resulted in severely biased regression coefficients; the Results: When data were MAR conditional oy were consistently underestimated in our scenarios. In the same scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis did not result in biased estimates, contrary to expectation.Conclusion: The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR, with respect to bias and precision. In some scenarios CC may be superior to MI. Often it is not feasible to identify the cause of incomplete data in a given dataset. Therefore, emphasis should be placed on reporting the extent of missing values, the method that was used to address the problem, and the assumptions that were made about the mechanism that caused missing data.

KW - multiple imputation

KW - complete case analysis

KW - missing data

KW - bias

KW - regression

KW - MULTIPLE IMPUTATION

KW - VALUES

U2 - 10.2427/11598

DO - 10.2427/11598

M3 - Article

SN - 2282-0930

VL - 13

SP - e11598-1-e11598-8

JO - Epidemiology, Biostatistics and Public Health

JF - Epidemiology, Biostatistics and Public Health

IS - 1

M1 - e11598

ER -