splitData: Splitting Data

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/splitData.R

Description

this function splits the Data into Train and Test Sets, it returns a list containing two data frames for the train and test sets.

Usage

1
splitData(data, seed = NULL, y, p, ...)

Arguments

data

object of class "data.frame" with target variable and predictor variables.

seed

integer. Values for the random number generator. Default: NULL.

y

character. Target variable.

p

numeric. Proportion of data to be used for training.

...

additional arguments to be passed to createDataPartition function in caret package to control the way the data is split.

Details

splitData relies on the createDataPartition function from the caret package to perform the data split.

If y is a factor, the sampling of observations for each set is done within the levels of y such that the class distributions are more or less balanced for each set.

If y is numeric, the data is split into groups based on percentiles and the sampling done within these subgroups. See createDataPartition for more details on additional arguments that can be passed.

Value

A list with two data frames: the first as train set, and the second as test set.

Author(s)

Zakaria Kehel, Bancy Ngatia

See Also

createDataPartition

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if(interactive()){
 # Split the data into 70/30 train and test sets for factor y
 data(septoriaDurumWC)
 split.data <- splitData(septoriaDurumWC, seed = 1234,
                         y = 'ST_S', p = 0.7)

 # Get training and test sets from list object returned
 trainset <- split.data$trainset
 testset <- split.data$testset
}

icardaFIGSr documentation built on Dec. 11, 2021, 9:21 a.m.