Stat Studio Workshop: Missing Data and Multiple Imputations

Missing Data and Multiple Imputation

Abstract

Real world data is rarely even complete and missing data is more the rule than the exception. Instead of ignoring this fact and analyzing only complete cases, you will learn how to report missing data and incorporate them into your analysis.

Date
Location
Zoom Video Conferencing (register to receive the link and password)

What is the difference between MCAR, MAR, MNAR? We will cover the difference between these three, as well as how to distinguish (test) which type describes the underlying mechanism that is likely responsible for the generation of missingness in specific data, as well what this distinction implies for options forward.

Historically, simplistic and sub-optimal methods have been used for dealing with missing data: complete case analysis, single value imputation (mean, regression, hot decking, ect.). No matter the method used to deal with missing data, similar reporting of the extent of missingness and proposed mechanism is always required.

Multiple imputations (MI) is conducted in 3 stages: imputation, analysis, and pooling. We will focus on Multivariate Imputation by Chained Equations (MICE), an MCMC algorithm for imputation, and Rubin’s rules for pooling. We will apply this process for various inferential techniques, including t-tests, ANOVA-based methods, and regression analysis.

Application of MI will be expanded to more complex situations: planned missing designs and multilevel/hierarchical sampling structures.