machine_split: Split data into training, validation and test data sets

View source: R/machine_split.R

machine_splitR Documentation

Split data into training, validation and test data sets

Description

Splits the data into three data sets of custom size. Lable proportions in the original data set will be repsected for all three data sets.

Usage

machine_split(data = predictors , group = "timestamp" , behaviour , train_size = 0.6 , val_size = 0.2 , names = c("train_data","val_data","test_data"))

Arguments

data

Complete data set that will be split in three sub sets

group

Variable that identifies data belonging to the same group. Usually timestamp in acc behaviour predictions is the default.

behaviour

Coulmn name with the behaviour labels.

train_size

Proportion of data that will create the training data. Possibile values are between 0 and 1. Default is 0.6.

val_size

Proportion of data that will create the validation data. Possibile values are between 0 and 1. Default is 0.2.

names

Vector of names for the three data sets. Default is c("train_data","val_data","test_data")

Details

This function should be run without saving into an object. The resulting subsets will be created from within the function. The sizes of the sets are free to choose. the test data set will be of size 1- train_size - val_size. In case train_size and val_size add up to 1 no test data will be created. All data sets will have exclusive data meaning no data can be in more than one of the created data subsets. For this reason train_size + val_size must not exceed 1.

Value

Outputs three dplyr tibbles of variable size.

Author(s)

Wanja Rast

Examples

labels <- sample(c("a","b","c","d") , size = 1000 ,
                  replace = T , prob = c(0.5,0.25,0.2,0.05))

data <- data.frame(timestamp = rep(1:1000 , each = 10) ,
                    measurments = rep(1:10 , n = 10000),
                    label = rep(labels , each = 10))

machine_split(data = data , group = "timestamp" , train_size = 0.6 , val_size = 0.2)


wanjarast/accelerateR documentation built on June 21, 2022, 3:29 p.m.