split_data: Split data by target's distrubtion

Description Usage Arguments Value Examples

Description

Split data set into n sets based on target's distribution with or without observation repetition. Each set should contain at least one observation per class to keep needed distribution.

Usage

1
get_data_types(df = df, target_column = "some_column")

Arguments

df

data.frame

target_column

string; name of the column which is assumed as target column. df must contain this column.

ratio

numeric vector; represents proportions how to split data. length(ratio) == n.

replace

boolean; sampling with or without replacement.

Value

list of n data frames with equal target distribtuion

Examples

1
2
3
4
5
6
7
8
df <- data.frame(column1 = rep(TRUE,50),
                 column2 = c(LETTERS[1:25], LETTERS[26:2]),
                 column3 = seq(1,50),
                 column4 = c(rep("a",45), rep("b",5)),
                 column5 = seq(1,50,by=1),
                 target_col = c(rep("A",25), rep("B", 25)),
                 stringsAsFactors = FALSE)
split_data(df=df, target_column = "target_col", ratio = c(0.1, 0.4, 0.3, 0.2))

agritag/infeR documentation built on June 8, 2019, 7:43 p.m.