impute_vars: Quick impute
In pewresearch/pewmethods: Pew Research Center Methods Miscellaneous Functions

Description Usage Arguments Details Value Examples

Given a dataset, singly imputes specified variables within that dataset. Meant for tasks that need data to be filled in as an intermediate step, such as filling in a small amount of missing values in variables that will be used for raking. Not meant to be used for applications for which measuring the uncertainty due to imputation is important.

1	impute_vars(.data, to_impute = NULL, method = "ranger", seed = NA, ...)

`.data`	The `data.frame` to be imputed.
`to_impute`	The variables in the dataset for which missing data should be imputed. Can be a `character` vector of names, a `numeric` vector of column positions, or a list of columns generated by `dplyr::vars`. (See `help("select_at")` for details.) Must have at least two variables. Defaults to `NULL`, in which case this function will search for variables with the prefix "rk_", meant to fit in a workflow where missing data is imputed for variables to be used in weighting.
`method`	The imputation method, passed to `mice()`. The default method is random forest imputation via the `ranger()` package, which is a custom method that comes with the `pewmethods` package. Other methods built into the `mice` package will work.
`seed`	Ensures that the missing values will be filled in the exact same way when rerun. No seed is set by default.
`...`	Other arguments passed to `mice::mice`.

This function is a wrapper around mice::mice that does only one imputation and does not output any diagnostics. The main use of this function is to quickly impute only some variables in a dataset. Quick imputation is useful for some limited purposes such as the need to fill in the generally small amounts of missing data in variables to be used in raking.

Note that the imputation model will only use data from the variables you pass to this function. If you pass only two variables, then only those two variables and nothing else will be used to fill in missing values. If there are other variables in your data that are strongly related to the variables to be imputed, they should be specified in to_impute, even if they have no missing data.

The data frame with missing values filled in for only the raking variables, leaving the original ones as they were.

library(dplyr)
# We can use dk_to_na to create new versions of variables where certain factor labels
# are recoded as missing, then impute those variables. If the to_impute argument is not
# specified, the function will by default look for variables starting with "rk_".
dec13_excerpt_raking <- dec13_excerpt %>%
  mutate(rk_sex = sex,
         rk_recage = dk_to_na(recage, pattern = "DK/Ref"),
         rk_receduc = dk_to_na(receduc, pattern = "DK/Ref"),
         rk_racethn2 = dk_to_na(racethn2, pattern = "Ref")) %>%
  impute_vars(.)
# We can also pass specific variables to impute
# In this example, only q1 has missing data, but we want to fill in q1 based on values of
# age, education, gender and race/ethnicity, so we have to pass those variables in as well
dec13_excerpt_raking <- dec13_excerpt %>%
  mutate(q1 = dk_to_na(q1, pattern = "Refused")) %>%
  impute_vars(to_impute = c("q1", "recage", "receduc", "racethn2", "sex"))