knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Here, we are going to break down some differences between the multiple imputation statics generate in with the mice
package in R and Stata libraries mi
. To do so, we will use the multiple imputation data generated in STATA with the lassopmm
sample code.
Using R for the same kind of analysis on the same data.
suppressMessages(library(dplyr)) suppressMessages(library(tidyr)) suppressMessages(library(mice)) suppressMessages(library(mitools))
stata_pool_data <- haven::read_dta("DTA-FILE-NAME.dta") %>% haven::zap_labels()
stata_pool_data <- haven::read_dta( "F:\\Drive\\WB\\leonardo\\learning\\lassopmm_support\\base_pmm_example.dta") %>% haven::zap_labels()
Now we clean the data to the mice
like data structure, where we have .id
and .imp
variables and the imputed data itself.
stata_pmm_part <- stata_pool_data %>% filter(is.na(price)) %>% select(mpg, weight, contains("price")) %>% mutate(.id = row_number()) %>% gather(type, price, 3:(length(.) - 1)) %>% { a <- (.) %>% distinct(type) %>% mutate(.imp = 0:(nrow(.) - 1)) (.) %>% left_join(a, by = "type") } %>% select(-type) %>% select(.imp, .id, everything()) glimpse(stata_pmm_part)
Finally, we convert this data to the mice
form and run basic analysis of the means and standard deviations. To estimate mean of the multiple imputation data in R using mice package, we need to use regression methods. We basically regress our variable of interest with one intercept only and summaries the statistics on the pooled multiple imputation data set. For more information about that wee the book Flexible Imputation of Missing Data and specifically chapter 2.3 and chapter 2.4.
mice
packagestt <- as.mids(stata_pmm_part) fit <- with(stt, lm(price ~ 1, weights = weight)) est <- pool(fit) est # Getting pool results summary(est, conf.int = TRUE) # Getting pooled LM summary
The R results are identical to the Stata!
estimate
stands for the meant
for the variance and t^0.5
for the standard error std.error
riv
for the Average RVI
fmi
for the Largest FMI
df
for the degrees of freedom statistics.The numbers are almost identical to Stata (see below).
Stata results:
. mi estimate: mean price if samples==1 [aw=weight] (5 values of imputed variable price in m>0 updated to match values in m=0) Multiple-imputation estimates Imputations = 5 Mean estimation Number of obs = 7 Average RVI = 0.1190 Largest FMI = 0.1737 Complete DF = 6 DF adjustment: Small sample DF: min = 4.12 avg = 4.12 Within VCE type: Analytic max = 4.12 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ price | 6711.486 1521.65 2535.511 10887.46 --------------------------------------------------------------
dda <- stata_pmm_part %>% filter(!is.na(price)) %>% rename(id = .id) %>% group_by(.imp) %>% nest() %>% select(data) %>% unlist(recursive = F, use.names = F) dda_mi <- imputationList(dda) model <- with(dda_mi, expr = lm(price ~ 1, weights = weight)) MIcombine(model) summary(MIcombine(model))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.