impute_tree: Decision Tree Imputation
In simputation: Simple Imputation

impute_cart

R Documentation

Decision Tree Imputation

Description

Imputation based on CART models or Random Forests.

Usage

impute_cart(
  dat,
  formula,
  add_residual = c("none", "observed", "normal"),
  cp,
  na_action = na.rpart,
  impute_all = FALSE,
  ...
)

impute_rf(
  dat,
  formula,
  add_residual = c("none", "observed", "normal"),
  na_action = na.omit,
  impute_all = FALSE,
  ...
)

Arguments

`dat`	`[data.frame]`, with variables to be imputed and their predictors.
`formula`	`[formula]` imputation model description (see Details below).
`add_residual`	`[character]` Type of residual to add. `"normal"` means that the imputed value is drawn from `N(mu,sd)` where `mu` and `sd` are estimated from the model's residuals (`mu` should equal zero in most cases). If `add_residual = "observed"`, residuals are drawn (with replacement) from the model's residuals. Ignored for non-numeric predicted variables.
`cp`	The complexity parameter used to `prune` the CART model. If omitted, no pruning takes place. If a single number, the same complexity parameter is used for each imputed variable. If of length `#` of variables imputed, the complexity parameters used must be in the same order as the predicted variables in the `model` formula.
`na_action`	`[function]` what to do with missings in training data. By default cases with missing values in predicted or predictors are omitted (see ‘Missings in training data’).
`impute_all`	`[logical]` If FALSE (default) then only missings in predicted variables are imputed. If TRUE, predictions are imputed for all records and if a prediction cannot be made then NA is imputed.
`...`	further arguments passed to `rpart` for `impute_cart` `randomForest` for `impute_rf`

Model specification

Formulas are of the form

IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]

The left-hand-side of the formula object lists the variable or variables to be imputed. Variables on the right-hand-side are used as predictors in the CART or random forest model.

If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.

Grouping using dplyr::group_by is also supported. If groups are defined in both the formula and using dplyr::group_by, the data is grouped by the union of grouping variables. Any missing value in one of the grouping variables results in an error.

Methodology

CART imputation by impute_cart can be used for numerical, categorical, or mixed data. Missing values are estimated using a Classification and Regression Tree as specified by Breiman, Friedman and Olshen (1984). This means that prediction is fairly robust agains missingess in predictors.

Random Forest imputation with impute_rf can be used for numerical, categorical, or mixed data. Missing values are estimated using a Random Forest model as specified by Breiman (2001).

References

Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A., 1984. Classification and regression trees. CRC press.

Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.

simputation
Simple Imputation

impute_tree: Decision Tree Imputation
In simputation: Simple Imputation

Decision Tree Imputation

Description

Usage

Arguments

Model specification

Methodology

References

See Also

Related to impute_tree in simputation...

R Package Documentation

Browse R Packages

We want your feedback!

simputation Simple Imputation

impute_tree: Decision Tree Imputation In simputation: Simple Imputation

Decision Tree Imputation

Description

Usage

Arguments

Model specification

Methodology

References

See Also

Related to impute_tree in simputation...

R Package Documentation

Browse R Packages

We want your feedback!

simputation
Simple Imputation

impute_tree: Decision Tree Imputation
In simputation: Simple Imputation