impute_model: Generate a model to impute missing data in a column

Description Usage Arguments Value

View source: R/imputation.R

Description

We don't recommend to impute categorical variables with many levels.

Usage

1
2
impute_model(data, column, NA_value = is.na, exclude_columns,
  controls = NA, type = "xgboost")

Arguments

data

The dataset, as a data.frame.

column

The column to be imputed. Should be a string.

NA_value

A function to define what a NA is for this column which returns TRUE when a value is missing and FALSE otherwise. Will take one column vector as input.

exclude_columns

Columns that shouldn't be included in the imputation.

controls

Either:

  • NA for defaults

  • A list for params from xgboost. Should always contain at least nrounds.

type

The type of algorithm to use for imputation. Options: mean, lm, and xgboost. Mean will calculate the mean for numeric sets and the mode for non-numerics.

Value

The model built based on type.


jeroenvdhoven/datapiper documentation built on July 14, 2019, 9:34 p.m.