data_partition: "data_partition" Constructor
In jashu/beset: Best Subset Predictive Modeling

data_partition

R Documentation

"data_partition" Constructor

Description

Constructs an object of class data_partition.

Usage

data_partition(
  train,
  test,
  y,
  x = NULL,
  offset = NULL,
  weights = NULL,
  na_action = na.omit
)

Arguments

`train`	A `data.frame` containing the training data to be used for model fitting.
`test`	A `data.frame` containing the test data to be used for model validation.
`y`	`Character` string giving the name of the column containing the response variable to be predicted.
`x`	(Optional) `character` vector giving the names of the columns containing the predictor variables. If omitted, defaults to all columns other than those named as `y`, `offset`, or `weights`.
`offset`	(Optional) `character` string giving the name of the column containing a model offset. An offset is a known predictor that is added to a linear model as is (with a beta coefficient of 1) rather than having its beta coefficient optimized. If given, an offset must be included for both the `train` and `test` data frames.
`weights`	(Optional) `character` string giving the name of the column containing observation weights. Use these if you want some rows of the data frame to exert more or less influence than others on a model fit. If given, the `weights` column is only applied during model training; a `weights` column in the `test` data will be ignored.
`na_action`	`Function` defining how `NA`s shoud be treated. Options include `na.omit` (default), `na.fail`, `na.exclude`, and `na.pass`.

Details

A data_partition object is a list containing exactly two data frames (train and test). This object will normally be constructed by passing a single data frame to partition. Use this constructor function in the event that you wish to manually link two independent data sets: one to be used for model training and the other to be used for model testing.

data_partition objects can be passed as the data argument to the beset modeling functions (beset_lm, beset_glm, and beset_elnet), in which case these functions will train and cross-validate models using the train data and append additional evaluation metrics using the test data. Note that in earlier development versions, these functions provided an optional test_data argument for this purpose. This has been removed and you are now required to construct a data_partition object beforehand because the data_partition constructor performs a number of important checks to insure that your test data are compatible with your train data: 1) all predictor and response variables are present in both data sets, 2) the levels of all factor variables are the same for both data sets, 3) if an offset variable is used for model training, an offset variable is provided for predicting the test data, and 4) unless na_action is set to na.pass, both data frames contain complete cases with no missing data. The data_partition constructor will alert you to potential issues, attempt to resolve them, and return an error if it can't.

Value

A data_partition object containing a train data frame and a test data frame.

Examples

train <- mtcars[1:16,]
test <- mtcars[17:32,]
factor_names <- c("cyl", "vs", "am", "gear", "carb")
train[factor_names] <- purrr::map_dfc(train[factor_names], factor)
test[factor_names] <- purrr::map_dfc(test[factor_names], factor)
data <- data_partition(train, test, "mpg")

jashu/beset documentation built on April 20, 2023, 5:28 a.m.