get_model_data: Minimizes dataset to data needed for modelling

View source: R/get_model_data.R

get_model_dataR Documentation

Minimizes dataset to data needed for modelling

Description

get_model_data() ensures that only variables necessary for the model are included in the dataset and missing data and test sets are removed, if test_col is not NULL. If filter_na is "all" (the default), then any observations with NA values are removed using na.omit(). If filter_na is "response" or "predictors" then only rows with missing dependent or independent variables are removed, respectively. If "none", then no filtering is done at all.

Usage

get_model_data(
  df,
  formula_vars,
  test_col,
  group_col = NULL,
  filter_na,
  reduce_columns = TRUE
)

Arguments

df

Data frame of model data.

formula_vars

Character vector of variables used in the model. Can be extracted from a formula using all.vars(fmla).

test_col

Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If NULL, ignored. See model_error() for details on the methods and metrics returned.

group_col

Column name(s) of group(s) to use in dplyr::group_by() when supplying type, calculating mean absolute scaled error on data involving time series, and if group_models, then fitting and predicting models too. If NULL, not used. Defaults to "iso3".

filter_na

Character value specifying how, if at all, to filter NA values from the dataset prior to applying the model. By default, all observations with missing values are removed, although it can also remove rows only if they have missing dependent or independent variables, or no filtering at all.

reduce_columns

Logical on whether or not to reduce the number of columns in the data to just those necessary for modelling.

Value

A data frame.


caldwellst/augury documentation built on Oct. 10, 2024, 8:20 a.m.