SplitDataset: SplitDataset

Description Usage Arguments Value

View source: R/SplitDataset.R

Description

Splits the dataset into training, validation, and test samples, although they can be given different names.

Usage

1
2
3
4
SplitDataset(study.sample, events = NULL, event.variable.name = NULL,
  event.level = NULL, split.proportions = NULL,
  temporal.split = NULL, remove.missing = FALSE, random.seed = NULL,
  sample.names = NULL, return.data.frame = FALSE)

Arguments

study.sample

A data.frame. The study sample. No default.

events

A numeric vector of the same length as the number of splits or NULL. Each item should indicate the number of events to be included in the resulting sample. If NULL split.proportions is used instead. Defaults to NULL.

event.variable.name

A character vector of length 1 or NULL. The name of the variable that defines an event. Defaults to NULL.

event.level

A character or numeric vector of length 1 or NULL. The level of event.variable that defines an event. Default to NULL.

split.proportions

A numeric vector of the same length as the number of splits or NULL. Each item should indicate the proportion of the dataset that should be included in the resulting sample. If NULL events is used instead. Defaults to NULL.

temporal.split

Not yet implemented.

remove.missing

A logical vector of length 1. If TRUE observations with missing event data, as detected by is.na(), are removed from the sample and a warning is issued. If FALSE execution stops if there is missing event data. Defaults to FALSE.

random.seed

A numeric vector of length 1 or NULL. The random seed to be used when creating splits. Remember to set the seed outside this function if you are running other tasks that perform random operations. Defaults to NULL.

sample.names

A character vector of the same length as events or split.proportions, depending on which is used, or NULL. If NULL the samples will be called "training" and "test" if two samples are to created, or "training", "validation", and "test" if three samples are to be created. Defaults to NULL.

return.data.frame

A A logical vector of length 1. If TRUE a single data.frame is returned. This data.frame includes a new column called .sample which indicates what sample observations belong to. Defaults to FALSE.

Value

A named list with three data frames or a single data frame with an added column indicating what sample observations belong to.


martingerdin/bengaltiger documentation built on Feb. 29, 2020, 4:46 p.m.