process_missing: Preprocess Data to Handle Missing Variables

Description Usage Arguments Details Value

Description

Process data to account for missingness in preparation for TMLE

Usage

1
2
process_missing(data, node_list, complete_nodes = c("A", "Y"),
  impute_nodes = NULL, max_p_missing = 0.5)

Arguments

data,

data.table, containing the missing variables

node_list,

list, what variables comprise each node

complete_nodes,

character vector, nodes we must observe

impute_nodes,

character vector, nodes we will impute

max_p_missing,

numeric, what proportion of missing is tolerable? Beyond that, the variable will be dropped from the analysis

Details

Rows where there is missingness in any of the complete_nodes will be dropped. Then, missingness will be median-imputed for the variables in the impute_nodes. Indicator variables of missingness will be generated for these nodes.

Then covariates will be processed as follows:

  1. any covariate with more than max_p_missing missingness will be dropped

  2. indicators of missingness will be generated

  3. missing values will be median-imputed

Value

list containing the following elements:


lurui0421/Super-Learning- documentation built on July 4, 2019, 1:02 p.m.