Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

S.M.J. van Kuijk*, W. Viechtbauer, Ludovicus Peeters, L. Smits

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

61 Downloads (Pure)


Background: The purpose of this simulation study is to compare bias in the estimation of regression coefficients between multiple imputation (MI) and complete case (CC) analysis when assumptions of missing data mechanisms are violated.

Methods: The authors performed a stochastic simulation study in which data were drawn from a multivariate normal distribution, and missing values were created according to different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). Data were analysed with a linear regression model using CC analysis, and after MI. In addition, characteristics of the data (i.e. correlation, size of the regression coefficients, error variance, proportion of missing data) were varied to assess the influence on the size and sign of bias. n Y, CC analysis resulted in severely biased regression coefficients; the Results: When data were MAR conditional oy were consistently underestimated in our scenarios. In the same scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis did not result in biased estimates, contrary to expectation.

Conclusion: The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR, with respect to bias and precision. In some scenarios CC may be superior to MI. Often it is not feasible to identify the cause of incomplete data in a given dataset. Therefore, emphasis should be placed on reporting the extent of missing values, the method that was used to address the problem, and the assumptions that were made about the mechanism that caused missing data.

Original languageEnglish
Article numbere11598
Pages (from-to)e11598-1-e11598-8
Number of pages8
JournalEpidemiology, Biostatistics and Public Health
Issue number1
Publication statusPublished - 1 Jan 2016


  • multiple imputation
  • complete case analysis
  • missing data
  • bias
  • regression

Cite this