knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
options(useFancyQuotes=FALSE, width=100)
This document will walk you through some of the methods you could use to generate pooled model results that account for both sampling variability and across imputation variability. The package hot.deck
does not come with a set of functions to do inference, so we will show you how you could use the data generated by hot.deck
in combination with glm.mids
(and similarly lm.mids
) from the mice
package, zelig
from the Zelig
package and by using MIcombine
from the mitools
package on a list of model objects.
The data we will use come from @Poeetal1999 dealing with democracy and state repression. First we need to call the hot.deck
routine on the dataset.
library(hot.deck) data(isq99) out <- hot.deck(isq99, sdCutoff=3, IDvars = c("IDORIGIN", "YEAR"))
This shows us that there are still 45 observations with fewer than 5 donors. Using a different method or further widening the sdCutoff
parameter may alleviate the problem. If you want to see the frequency distribution of the number of donors, you could look at:
numdonors <- sapply(out$donors, length) numdonors <- sapply(out$donors, length) numdonors <- ifelse(numdonors > 5, 6, numdonors) numdonors <- factor(numdonors, levels=1:6, labels=c(1:5, ">5")) table(numdonors)
Before running a model, three variables have to be created from those existing. Generally, if variables are deterministic functions of other variables (e.g., transformations, lags, etc...) it is advisable to impute the constituent variables of the calculations and then do the calculations after the fact. Here, we need to lag the AI
variable and create percentage change variables for both population and per-capita GNP. First, to create the lag of AI
, PCGNP
and LPOP
. To do this, we will make a little function.
tscslag <- function(dat, x, id, time){ obs <- apply(dat[, c(id, time)], 1, paste, collapse=".") tm1 <- dat[[time]] - 1 lagobs <- apply(cbind(dat[[id]], tm1), 1, paste, collapse=".") lagx <- dat[match(lagobs, obs), x] } for(i in 1:length(out$data)){ out$data[[i]]$lagAI <- tscslag(out$data[[i]], "AI", "IDORIGIN", "YEAR") out$data[[i]]$lagPCGNP <- tscslag(out$data[[i]], "PCGNP", "IDORIGIN", "YEAR") out$data[[i]]$lagLPOP <- tscslag(out$data[[i]], "LPOP", "IDORIGIN", "YEAR") }
Now, we can use the lagged values of PCGNP
and LPOP
, to create percentage change variables:
for(i in 1:length(out$data)){ out$data[[i]]$pctchgPCGNP <- with(out$data[[i]], c(PCGNP-lagPCGNP)/lagPCGNP) out$data[[i]]$pctchgLPOP <- with(out$data[[i]], c(LPOP-lagLPOP)/lagLPOP) }
You can use the MIcombine
command from the mitools
package to generate inferences, too. Here, you have to produce a list of model estimates and the function will combine across the different results.
# initialize list out <- hd2amelia(out) results <- list() # loop over imputed datasets for(i in 1:length(out$imputations)){ results[[i]] <- lm(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT + BRIT + POLRT + CWARCOW + IWARCOW2, data=out$imputations[[i]]) } summary(mitools::MIcombine(results))
The final method for combining results is to convert the data object returned by the hot.deck
function to an object of class mids
. This can be done with the datalist2mids
function from the miceadds
package.
out.mids <- miceadds::datalist2mids(out$imputations) s <- summary(mice::pool(mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT + BRIT + POLRT + CWARCOW + IWARCOW2, data=out.mids))) print(s, digits=4)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.