partition | R Documentation |
partition
randomly splits a data frame into two model frames,
train
and test
, which are returned as a
"data_partition" structure.
partition(
data,
y,
frac = 0.5,
x = NULL,
offset = NULL,
weights = NULL,
na_action = na.omit,
seed = 42
)
data |
Data frame to be partitioned. |
y |
|
frac |
The fraction of data that should be included in the training set.
Default is |
x |
(Optional) |
offset |
(Optional) |
weights |
(Optional) |
na_action |
|
seed |
|
partition
creates a train/test split among the rows of a data frame
based on stratified random sampling within the factor levels of a
classification outcome or the quartiles of a numeric outcome. This insures
that the training and test samples will be closely matched in terms of class
incidence or frequency distribution of the outcome measure. partition
includes a seed
argument so that the randomized partitioning is
reproducible. The train
and test
data frames are returned
bound together in a data_partition
structure so that their
common ancestry is maintained and self-documented. For example, if you name
your data_partition
"data
", you can intuitively access
the training set with data$train
and its corresponding test set with
data$test
.
An object of class "data_partition": a list containing two model
frames named train
and test
, containing the training and
testing sets, respectively.
set.seed
, data_partition
data <- mtcars
factor_names <- c("cyl", "vs", "am", "gear", "carb")
data[factor_names] <- purrr::map_dfc(data[factor_names], factor)
data <- partition(data, y = "mpg")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.