get_level_1_data: Get level 1 training and testing data

Description Usage Arguments Details Value Examples

Description

Get level 1 training and testing data

Usage

1
2
get_level_1_data(training_frame, response, model_wrappers,
  testing_frame = NULL, n_folds = 5, ...)

Arguments

response

string, name of the response column

model_wrappers

(named) vector or list of functions. Names are optional, and in their absence names will be automatically generated. See details below for defining model wrappers.

testing_frame

data.frame, optional. If present, each model_wrapper is trained on the entire training_frame and then used to predict on the testing_frame.

n_folds

integer, number of cross-validation folds. May be omitted if training_frame already contains a fold column defined by get_cv_folds

...

arguments passed to the user defined model wrappers.

Details

The level 1 data is always generated on data not seen by the training model.

Each model wrapper is a function that accepts at least two arguments (training_frame and validation_frame) and returns a numeric vector whose length is nrow(validation_frame). Other arguments may exist, but both training_frame and validation_frame must be present for the wrapper to work correctly. The model wrapper should train on training_frame, predict on validation_frame, and output the result of the prediction as a numeric vector. Both training_frame and validation_frame are changed internally to data.tables for memory efficiency. If you're working with smaller data and do not want or need the power of data.table, you can work instead with data.frames by calling training_frame <- as.data.frame(training_frame) validation_frame <- as.data.frame(validation_frame) as the first two lines of your wrapper function.

Value

a list containing two data frames. The first is the level 1 training data, the second is the level 1 data for the testing set.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#First we define a model wrapper that trains a linear model
iris_model_wrapper <- function(training_frame, validation_frame){
 linear_model <- lm(Petal.Length ~ ., data = training_frame)
 predict(linear_model, newdata = validation_frame) #the output
}

iris_training <- iris[1:100,-5]
iris_testing <- iris[101:150,-5]

#define a CV fold column for iris_training so we can use it for all future
#model wrappers.
iris_training <- get_cv_folds(iris_training, n_folds = 10)

lv1_data <- get_level_1_data(iris_training,
                             response = "Petal.Length",
                             model_wrappers = c(iris_model_wrapper = iris_model_wrapper),
                             testing_frame = iris_testing)

josh-whitney/stacker documentation built on May 19, 2019, 8:51 p.m.