holdout: Data selection for holdout validation.

View source: R/holdout.R

holdoutR Documentation

Data selection for holdout validation.

Description

Split a data.frame into two subsets for holdout validation.

Usage

holdout(data, prop = 0.5, grouping = NULL, seed = NULL, determined = NULL)

Arguments

data

A data.frame.

prop

A single value or vector of proportions of data in calibration sample. Defaults to .5, for an even split.

grouping

Name of the grouping variable. Providing a grouping variable ensures that the provided proportion is selected within each group.

seed

A random seed. See Random for more details.

determined

Name of a variable indicating the pre-determined assignment to the calibration or the validation sample. This variable must be a factor containing only NA (no determined assingment), "calibrate", or "validate". If no variable is provided (the default) all cases are assigned randomly.

Value

Returns a list containing two data.frames, called calibrate and validate. The first corresponds to the calibration sample, the second to the validation sample.

Author(s)

Martin Schultze

See Also

crossvalidate

Examples


# seeded selection, 25% validation sample
data(fairplayer)
split <- holdout(fairplayer, .75, seed = 55635)
lapply(split, nrow) # check size of samples


stuart documentation built on June 7, 2023, 6:12 p.m.