The R Package hmi: A Convenient Tool for Hierarchical Multiple Imputation and Beyond

Matthias Speidel*, Joerg Drechsler, Shahab Jolani

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Web of Science)

Abstract

Applications of multiple imputation have long outgrown the traditional context of dealing with item nonresponse in cross-sectional data sets. Nowadays multiple imputation is also applied to impute missing values in hierarchical data sets, address confidentiality concerns, combine data from different sources, or correct measurement errors in surveys. However, software developments did not keep up with these recent extensions. Most imputation software can only deal with item nonresponse in cross-sectional settings and extensions for hierarchical data - if available at all - are typically limited in scope. Furthermore, to our knowledge no software is currently available for dealing with measurement error using multiple imputation approaches.

The R package hmi tries to close some of these gaps. It offers multiple imputation routines in hierarchical settings for many variable types (for example, nominal, ordinal, or continuous variables). It also provides imputation routines for interval data and handles a common measurement error problem in survey data: biased inferences due to implicit rounding of the reported values. The user-friendly setup which only requires the data and optionally the specification of the analysis model of interest makes the package especially attractive for users less familiar with the peculiarities of multiple imputation. The compatibility with the popular mice package (Van Buuren and Groothuis-Oudshoorn 2011) ensures that the rich set of analysis and diagnostic tools and post-imputation functions available in mice can be used easily, once the data have been imputed.

Original languageEnglish
Pages (from-to)1-48
Number of pages48
JournalJournal of Statistical Software
Volume95
Issue number9
DOIs
Publication statusPublished - Oct 2020

Keywords

  • hierarchical data
  • multiple imputation
  • multilevel models
  • measurement error
  • heaping
  • R
  • MISSING DATA
  • CHAINED EQUATIONS
  • BAYESIAN-APPROACH
  • TIME
  • SEROCONVERSION
  • ADDRESS
  • HIV-1
  • MI

Cite this