process_missing: Preprocess Data to Handle Missing Variables

View source: R/process_missing.R

process_missingR Documentation

Preprocess Data to Handle Missing Variables

Description

Process data to account for missingness in preparation for TMLE

Usage

process_missing(
  data,
  node_list,
  complete_nodes = c("A", "Y"),
  impute_nodes = NULL,
  max_p_missing = 0.5
)

Arguments

data,

data.table, containing the missing variables

node_list,

list, what variables comprise each node

complete_nodes,

character vector, nodes we must observe

impute_nodes,

character vector, nodes we will impute

max_p_missing,

numeric, what proportion of missing is tolerable? Beyond that, the variable will be dropped from the analysis

Details

Rows where there is missingness in any of the complete_nodes will be dropped. Then, missingness will be median-imputed for the variables in the impute_nodes. Indicator variables of missingness will be generated for these nodes.

Then covariates will be processed as follows:

  1. any covariate with more than max_p_missing missingness will be dropped

  2. indicators of missingness will be generated

  3. missing values will be median-imputed

Value

list containing the following elements:

  • data, the updated dataset

  • node_list, the updated list of nodes

  • n_dropped, the number of observations dropped

  • dropped_cols, the variables dropped due to excessive missingness


jeremyrcoyle/tmle3 documentation built on May 20, 2022, 7:36 a.m.