getGenericTrainValTestData: getGenericTrainValTestData
In SPOTMisc: Misc Extensions for the 'SPOT' Package

getGenericTrainValTestData

R Documentation

getGenericTrainValTestData

Description

getGenericTrainValTestData

Usage

getGenericTrainValTestData(dfGeneric = NULL, prop = 0.5)

Arguments

dfGeneric

data, e.g., obtained with getDataCensus. Default: NULL.

prop

vector. proportion between train / test and train/val. Default: 2/3. If one value is given, the same proportion will be used for both split. Otherwise, the first entry is used for the test/training split and the second value for the training/validation split. If the second value is 1, the validation set is empty. Given prop = (p1,p2), the data will be partitioned as shown in the following two steps:

Step 1:: train1 = p1*data and test = )(1-p1)*data
Step 2:: train2 = p2*train1 = p2*p1*data and val = )(1-p2)*train1 = (1-p2)*p1*data

Value

list with training, validation and test data: trainCensus, valCensus, testCensus.

Note

If p2=1, no validation data will be generated.

Examples


### These examples require an activated Python environment as described in
### Bartz-Beielstein, T., Rehbach, F., Sen, A., and Zaefferer, M.:
### Surrogate Model Based Hyperparameter Tuning for Deep Learning with SPOT,
### June 2021. http://arxiv.org/abs/2105.14625.
PYTHON_RETICULATE <- FALSE
if(PYTHON_RETICULATE){
task.type <- "classif"
nobs <- 1e4
nfactors <- "high"
nnumericals <- "high"
cardinality <- "high"
data.seed <- 1
cachedir <- "oml.cache"
target = "age"
prop <- 2 / 3
dfCensus <- getDataCensus(task.type = task.type,
nobs = nobs, nfactors = nfactors,
nnumericals = nnumericals, cardinality = cardinality,
data.seed = data.seed, cachedir = cachedir,
target = target)
census <- getGenericTrainValTestData(dfGeneric=dfCensus,
prop = prop)
## train data size is 2/3*2/3*10000:
dim(census$trainGeneric)
}

SPOTMisc documentation built on Sept. 5, 2022, 5:06 p.m.