e_data_complete_by_variable_subset: For missing data, determine which sets of variables result in...

View source: R/e_data_complete_by_variable_subset.R

e_data_complete_by_variable_subsetR Documentation

For missing data, determine which sets of variables result in the most number of complete observations

Description

For missing data, determine which sets of variables result in the most number of complete observations

Usage

e_data_complete_by_variable_subset(dat, var_list = NULL, var_resp = NULL)

Arguments

dat

data data.frame or tibble

var_list

list of variables, NULL for all

var_resp

NULL or one variable name to always be included (filters to keep only observations with this variable)

Value

out a tibble with the n_complete, n_var, var_names_print, and a list of variable names in var_names

Examples

# Generate missing values
dat_mtcars_miss_e <- dat_mtcars_e
prop_missing <- 0.10
n_missing <-
  sample.int(
    n    = prod(dim(dat_mtcars_miss_e))
  , size = round( prop_missing * prod(dim(dat_mtcars_miss_e)))
  )
ind_missing <- expand.grid(1:dim(dat_mtcars_miss_e)[1], 1:dim(dat_mtcars_miss_e)[2])[n_missing, ]
for (i_row in seq_along(n_missing)) {
  dat_mtcars_miss_e[ind_missing[i_row, 1], ind_missing[i_row, 2] ] <- NA
}

# Plot missing data
dat_mtcars_miss_e |> e_plot_missing()

out <- dat_mtcars_miss_e |> e_data_complete_by_variable_subset()
# Print table
out |> print(n = Inf, width = Inf)
# Print variable names from first row
out$var_names[1] |> unlist()

erikerhardt/erikmisc documentation built on April 17, 2025, 10:48 a.m.