data_partition: "data_partition" Constructor

View source: R/data_partition.R

data_partitionR Documentation

"data_partition" Constructor

Description

Constructs an object of class data_partition.

Usage

data_partition(
  train,
  test,
  y,
  x = NULL,
  offset = NULL,
  weights = NULL,
  na_action = na.omit
)

Arguments

train

A data.frame containing the training data to be used for model fitting.

test

A data.frame containing the test data to be used for model validation.

y

Character string giving the name of the column containing the response variable to be predicted.

x

(Optional) character vector giving the names of the columns containing the predictor variables. If omitted, defaults to all columns other than those named as y, offset, or weights.

offset

(Optional) character string giving the name of the column containing a model offset. An offset is a known predictor that is added to a linear model as is (with a beta coefficient of 1) rather than having its beta coefficient optimized. If given, an offset must be included for both the train and test data frames.

weights

(Optional) character string giving the name of the column containing observation weights. Use these if you want some rows of the data frame to exert more or less influence than others on a model fit. If given, the weights column is only applied during model training; a weights column in the test data will be ignored.

na_action

Function defining how NAs shoud be treated. Options include na.omit (default), na.fail, na.exclude, and na.pass.

Details

A data_partition object is a list containing exactly two data frames (train and test). This object will normally be constructed by passing a single data frame to partition. Use this constructor function in the event that you wish to manually link two independent data sets: one to be used for model training and the other to be used for model testing.

data_partition objects can be passed as the data argument to the beset modeling functions (beset_lm, beset_glm, and beset_elnet), in which case these functions will train and cross-validate models using the train data and append additional evaluation metrics using the test data. Note that in earlier development versions, these functions provided an optional test_data argument for this purpose. This has been removed and you are now required to construct a data_partition object beforehand because the data_partition constructor performs a number of important checks to insure that your test data are compatible with your train data: 1) all predictor and response variables are present in both data sets, 2) the levels of all factor variables are the same for both data sets, 3) if an offset variable is used for model training, an offset variable is provided for predicting the test data, and 4) unless na_action is set to na.pass, both data frames contain complete cases with no missing data. The data_partition constructor will alert you to potential issues, attempt to resolve them, and return an error if it can't.

Value

A data_partition object containing a train data frame and a test data frame.

Examples

train <- mtcars[1:16,]
test <- mtcars[17:32,]
factor_names <- c("cyl", "vs", "am", "gear", "carb")
train[factor_names] <- purrr::map_dfc(train[factor_names], factor)
test[factor_names] <- purrr::map_dfc(test[factor_names], factor)
data <- data_partition(train, test, "mpg")


jashu/beset documentation built on April 20, 2023, 5:28 a.m.