impute_vars: Quick impute

Description Usage Arguments Details Value Examples

View source: R/impute_vars.R

Description

Given a dataset, singly imputes specified variables within that dataset. Meant for tasks that need data to be filled in as an intermediate step, such as filling in a small amount of missing values in variables that will be used for raking. Not meant to be used for applications for which measuring the uncertainty due to imputation is important.

Usage

1
impute_vars(.data, to_impute = NULL, method = "ranger", seed = NA, ...)

Arguments

.data

The data.frame to be imputed.

to_impute

The variables in the dataset for which missing data should be imputed. Can be a character vector of names, a numeric vector of column positions, or a list of columns generated by dplyr::vars. (See help("select_at") for details.) Must have at least two variables. Defaults to NULL, in which case this function will search for variables with the prefix "rk_", meant to fit in a workflow where missing data is imputed for variables to be used in weighting.

method

The imputation method, passed to mice(). The default method is random forest imputation via the ranger() package, which is a custom method that comes with the pewmethods package. Other methods built into the mice package will work.

seed

Ensures that the missing values will be filled in the exact same way when rerun. No seed is set by default.

...

Other arguments passed to mice::mice.

Details

This function is a wrapper around mice::mice that does only one imputation and does not output any diagnostics. The main use of this function is to quickly impute only some variables in a dataset. Quick imputation is useful for some limited purposes such as the need to fill in the generally small amounts of missing data in variables to be used in raking.

Note that the imputation model will only use data from the variables you pass to this function. If you pass only two variables, then only those two variables and nothing else will be used to fill in missing values. If there are other variables in your data that are strongly related to the variables to be imputed, they should be specified in to_impute, even if they have no missing data.

Value

The data frame with missing values filled in for only the raking variables, leaving the original ones as they were.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
library(dplyr)
# We can use dk_to_na to create new versions of variables where certain factor labels
# are recoded as missing, then impute those variables. If the to_impute argument is not
# specified, the function will by default look for variables starting with "rk_".
dec13_excerpt_raking <- dec13_excerpt %>%
  mutate(rk_sex = sex,
         rk_recage = dk_to_na(recage, pattern = "DK/Ref"),
         rk_receduc = dk_to_na(receduc, pattern = "DK/Ref"),
         rk_racethn2 = dk_to_na(racethn2, pattern = "Ref")) %>%
  impute_vars(.)
# We can also pass specific variables to impute
# In this example, only q1 has missing data, but we want to fill in q1 based on values of
# age, education, gender and race/ethnicity, so we have to pass those variables in as well
dec13_excerpt_raking <- dec13_excerpt %>%
  mutate(q1 = dk_to_na(q1, pattern = "Refused")) %>%
  impute_vars(to_impute = c("q1", "recage", "receduc", "racethn2", "sex"))

pewresearch/pewmethods documentation built on March 27, 2020, 7:22 p.m.