View source: R/machine_split.R
machine_split | R Documentation |
Splits the data into three data sets of custom size. Lable proportions in the original data set will be repsected for all three data sets.
machine_split(data = predictors , group = "timestamp" , behaviour , train_size = 0.6 , val_size = 0.2 , names = c("train_data","val_data","test_data"))
data |
Complete data set that will be split in three sub sets |
group |
Variable that identifies data belonging to the same group. Usually timestamp in acc behaviour predictions is the default. |
behaviour |
Coulmn name with the behaviour labels. |
train_size |
Proportion of data that will create the training data. Possibile values are between 0 and 1. Default is 0.6. |
val_size |
Proportion of data that will create the validation data. Possibile values are between 0 and 1. Default is 0.2. |
names |
Vector of names for the three data sets. Default is c("train_data","val_data","test_data") |
This function should be run without saving into an object. The resulting subsets will be created from within the function. The sizes of the sets are free to choose. the test data set will be of size 1- train_size - val_size. In case train_size and val_size add up to 1 no test data will be created. All data sets will have exclusive data meaning no data can be in more than one of the created data subsets. For this reason train_size + val_size must not exceed 1.
Outputs three dplyr tibbles of variable size.
Wanja Rast
labels <- sample(c("a","b","c","d") , size = 1000 , replace = T , prob = c(0.5,0.25,0.2,0.05)) data <- data.frame(timestamp = rep(1:1000 , each = 10) , measurments = rep(1:10 , n = 10000), label = rep(labels , each = 10)) machine_split(data = data , group = "timestamp" , train_size = 0.6 , val_size = 0.2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.