partition_random: Partitioning A Dataset Randomly

Description Usage Arguments Value Warning See Also Examples

Description

Designed to create a validation column. Optionally records the result into a log file.

Usage

1
2
3
partition_random(x, name = 'Partition', train,
    val = 10^ceiling(log10(train))-train, test = TRUE,
		seed = FALSE, log = eval.parent(in_log_default))

Arguments

x

The data frame

name

The name of the validation column.

train

The proportion of the training set.

val

The proportion of the validation set. If not given, a default value is calculated by assuming the sum of train and val is a nth power of 10.

test

Whether to have test set. If TURE, a default value is calculated by assuming the sum of train and val is a nth power of 10.

seed

Whether to set a random seed. If you want a reproducible result, pass a number to seed as the random seed.

log

Controls log files. To produce log files, assign it or the log_arg variable in the parent environment (dynamic scope) a list of arguments for sink(), such as file, append, and split.

Value

A partitioned column.

Warning

x can only be a data frame. Don't pass a vector to it.

See Also

sink

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# refer to vignettes if you want to use log files
message('refer to vignettes if you want to use log files')

# building a data frame
A <- 2:16
B <- letters[12:26]
df <- data.frame(A, B)

# partitioning
df0 <- partition_random(df, train = 7)
df0 <- cbind(df, df0)
print(df0)
df0 <- partition_random(df, train = 7, val = 2)
df0 <- cbind(df, df0)
print(df0)

cleandata documentation built on May 1, 2019, 10:25 p.m.