split: Splits Dataset into Train and Test Datasets

split-methodsR Documentation

Splits Dataset into Train and Test Datasets

Description

Returns (invisibly) the object containing train and test observations \bm{y}_{1}, \ldots, \bm{y}_{n} as well as true class membership \bm{\Omega}_{g} for the test dataset.

Usage

## S4 method for signature 'numeric'
split(p = 0.75, Dataset = data.frame(), class = numeric(), ...)
## S4 method for signature 'list'
split(p = list(), Dataset = data.frame(), class = numeric(), ...)
## ... and for other signatures

Arguments

p

see Methods section below.

Dataset

a data frame containing dataset Y of length n. For the dataset the corresponding class membership \bm{\Omega}_{g} is known. The default value is data.frame().

class

a column number in Dataset containing the class membership information. The default value is numeric().

...

further arguments to sample.

Value

Returns an object of class RCLS.chunk.

Methods

signature(p = "numeric")

a number specifying the fraction of observations for training 0.0 \leq p \leq 1.0. The default value is 0.75.

signature(p = "list")

a list composed of column number p$type in Dataset containing the type membership information followed by the corresponding train p$train and test p$test values. The default value is list().

Author(s)

Marko Nagode

Examples

## Not run: 
data(iris)

# Split dataset into train (75

set.seed(5)

Iris <- split(p = 0.75, Dataset = iris, class = 5)

Iris

# Generate simulated dataset.

N <- 1000

class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N),
  rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N))

type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N))

n <- 300

Dataset <- data.frame(1:n, sample(class, n))

colnames(Dataset) <- c("y", "class")

# Split dataset into train (60

simulated <- split(p = 0.6, Dataset = Dataset, class = 2)

simulated

# Generate simulated dataset.

Dataset <- data.frame(1:n, sample(class, n), sample(type, n))

colnames(Dataset) <- c("y", "class", "type")

# Split dataset into train and test subsets.

simulated <- split(p = list(type = 3, train = "train",
  test = "test"), Dataset = Dataset, class = 2)

simulated

## End(Not run)

rebmix documentation built on Sept. 11, 2024, 6:30 p.m.