dataSplit: Split data into two different sets.

View source: R/dataSplit.R

dataSplitR Documentation

Split data into two different sets.

Description

Split data into two different sets by a specific fraction. Splitting data is normally used to obtain a train and a validation set.

Usage

dataSplit(x, y, f = 3/4, type = "random")

Arguments

x

The input grid object.

y

The observations object.

f

Could be a fraction, value between (0,1) indicating the fraction of the data that will define the train set, or an integer indicating the number of folds. It can also be a list of folds indicating the years of each fold.

type

A string. Indicates if the splitting should be random (type = "random"), chronological (type = "chronological") or specified by the user (type = NULL). Default is "random". Default is "random".

Value

A list of folds containing the x and y splitted.

Author(s)

J. Bano-Medina

Examples


require(climate4R.datasets)
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850", "VALUE_Iberia_pr")
x <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)
y <- VALUE_Iberia_pr
### Split the data in train and test (f < 1)###
data.splitted <- dataSplit(x,y,f = 3/4, type = "chronological")
str(data.splitted[[1]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[1]]$test$y$Dates)  # 1 fold out of 3 for test
### Split the data in 3 folds ###
data.splitted <- dataSplit(x,y,f = 3, type = "chronological")
str(data.splitted[[1]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[1]]$test$y$Dates)  # 1 fold out of 3 for test
str(data.splitted[[2]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[2]]$test$y$Dates)  # 1 fold out of 3 for test
str(data.splitted[[3]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[3]]$test$y$Dates)  # 1 fold out of 3 for test
data.splitted <- dataSplit(x,y,f = 3, type = "random")
str(data.splitted[[1]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[1]]$test$y$Dates)  # 1 fold out of 3 for test
str(data.splitted[[2]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[2]]$test$y$Dates)  # 1 fold out of 3 for test
str(data.splitted[[3]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[3]]$test$y$Dates)  # 1 fold out of 3 for test
data.splitted <- dataSplit(x,y,type = "chronological", 
                           f = list(c("1983","1984","1985","1986","1987",
                                      "1988","1989","1990","1991"),
                                    c("1992","1993","1994","1995","1996",
                                      "1997","1998","1999"),
                                    c("2000","2001","2002")))
str(data.splitted[[1]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[1]]$test$y$Dates)  # 1 fold out of 3 for test
str(data.splitted[[2]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[2]]$test$y$Dates)  # 1 fold out of 3 for test
str(data.splitted[[3]]$train$y$Dates) # 2 folds out of 3 for train  
str(data.splitted[[3]]$test$y$Dates)  # 1 fold out of 3 for test 


SantanderMetGroup/transformeR documentation built on Aug. 29, 2024, 6:42 a.m.