process_data | R Documentation |
A function called upon creating a task that uses the data provided to the task in order to process the covariates and identify missingness in the outcome. See parameters and details for more information.
process_data(data, nodes, column_names, flag = TRUE,
drop_missing_outcome = FALSE)
data |
A |
nodes |
A list of character vectors for |
column_names |
A named list of column names in the data, which is
generated when creating the |
flag |
Logical (default |
drop_missing_outcome |
Logical (default |
If the data provided to the task contains missing covariate values,
then a few things will happen. First, for each covariate with missing values,
if the proportion of missing values is greater than
getOption("sl3.max_p_missing")
, the covariate will be dropped. (The
default option "sl3.max_p_missing"
is 0.5 and it can be modified to
say, 0.75, by setting options("sl3.max_p_missing" = 0.75)
). Also,
for each covariate with missing values that was not dropped, a so-called
"missingness indicator" (that takes the name of the covariate with prefix
"delta_") will be added as an additional covariate. The missingness
indicator will take a value of 0 if the covariate value was missing and 1
if not. Also, imputation will be performed for each covariate with missing
values: continuous covariates are imputed with the median, and discrete
covariates are imputed with the mode. This coupling of imputation and
missingness indicators removes the missing covariate values, while
preserving the pattern of missingness, respectively. To avoid this default
imputation, users can perform imputation on their analytic dataset before
supplying it to make_sl3_Task
. We generally recommend the
missingness indicators be added regardless of the imputation strategy,
unless missingness is very rare.
This function also coverts any character covariates to factors, and one-hot encodes factor covariates.
Lastly, if the outcome
is supplied in creating the
sl3_Task
and if missing outcome values are detected in
data
, then a warning will be thrown. If
drop_missing_outcome = TRUE
then observations with missing outcomes
will be dropped.
A list of processed data, nodes and column names
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.