Description Usage Arguments Value See Also Examples
View source: R/learning-curve.R
Given a training and test set, fit a model on increasing fractions of the
training set, up to the full set, with a constant test set per repeat (each
repeat will have a different test set). The default is to use 10%, 20%, 30%,
. . ., 90%, 100%. Care is taken to make sure each fraction is a subset of the
last e.g. all samples present in the 10% will be present in the 20% to
simulate the addition of more data, as opposed to a random sample of more
data. Optionally, you can pass all of your data in as the training_data
and
then get the function to do the splitting for you.
1 2 3 4 5 6 7 8 9 10 11 |
model_evaluate |
A function with exactly two arguments: |
training_data |
A data frame. Subsets of this will be used for training.
If |
outcome |
A string. The name of the outcome variable. This must be a
column in |
testing_data |
A data frame. The trained models will all be tested against this constant test set. |
testing_frac |
A numeric vector with values between 0 and 1/3.The
fraction of |
training_fracs |
A numeric vector. Fractions of the training data to use. This must be a positive, increasing vector of real numbers ending in 1. |
repeats |
A positive integer. The number of times to repeat the sampling
for each proportion in |
strata |
A string. Variable to stratify on when splitting data. |
n_cores |
A positive integer. The cross-validation can optionally be done in parallel. Specify the number of cores for parallel processing here. |
A data frame with the following columns.
rep
: The repeat number.
testing_frac
: The fraction of training_data
that is set aside for
testing. If the testing_data
argument is specified, testing_frac
will
be 0, because none of training_data
is set aside for testing.
training_frac
: The fraction of the (post train/test split)training data
used for learning.
testing_indices
: The row indices of the training_data
argument that
were set aside for testing. If testing_data
is specified (and hence
none of training_data
needs to be set aside for testing, this will be a
vector of NA
s with length equal to the number of rows in
testing_data
.
training_indices
: The row indices of the training_data
that were used
for learning.
cv
: The cross-validation score.
test
: The test score.
autoplot.mirvie_learning_curve()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | data("BostonHousing", package = "mlbench")
bh <- dplyr::select_if(BostonHousing, is.numeric)
model_evaluate <- function(training_data, testing_data) {
trained_mod <- lm(medv ~ ., training_data)
training_preds <- predict(trained_mod, newdata = training_data)
preds <- predict(trained_mod, newdata = testing_data)
c(
train = yardstick::mae_vec(training_data$medv, training_preds),
test = yardstick::mae_vec(testing_data$medv, preds)
)
}
mlc <- mlc0 <- suppressWarnings(
learn_curve(model_evaluate, bh, "medv",
training_fracs = c(seq(0.1, 0.7, 0.2), 0.85),
testing_frac = c(0.25, 0.5), repeats = 8,
strata = "medv", n_cores = 4
)
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.