genericDataPrep: Create an input pipeline using tfdatasets

View source: R/funKerasGeneric.R

genericDataPrepR Documentation

Create an input pipeline using tfdatasets

Description

Create an input pipeline using tfdatasets

Usage

genericDataPrep(
  data,
  batch_size = 32,
  minLevelSizeEmbedding = 100,
  embeddingDim = NULL
)

Arguments

data

data. List, e.g., df$trainCensus, df$testGeneric, and df$valCensus data)

batch_size

batch size. Default: 32

minLevelSizeEmbedding

integer. Embedding will be used for factor variables with more than minLevelSizeEmbedding levels. Default: 100.

embeddingDim

integer. Dimension used for embedding. Default: floor(log(minLevelSizeEmbedding)).

Value

a fitted FeatureSpec object and the hold-out testGeneric (=data$testGeneric). This is returned as the follwoing list.

train_ds_generic

train

val_ds_generic

validation

test_ds_generic

test

specGeneric_prep

feature spec object

testGeneric

data$testGeneric

Examples


### These examples require an activated Python environment as described in
### Bartz-Beielstein, T., Rehbach, F., Sen, A., and Zaefferer, M.:
### Surrogate Model Based Hyperparameter Tuning for Deep Learning with SPOT,
### June 2021. http://arxiv.org/abs/2105.14625.
PYTHON_RETICULATE <- FALSE
if(PYTHON_RETICULATE){
target <- "age"
batch_size <- 32
prop <- 2/3
cachedir <- "oml.cache"
dfCensus <- getDataCensus(target = target,
nobs = 1000, cachedir = cachedir, cache.only=FALSE)
data <- getGenericTrainValTestData(dfGeneric = dfCensus,
prop = prop)
specList <- genericDataPrep(data=data, batch_size = batch_size)
## Call iterator:
require(magrittr)
specList$train_ds_generic %>%
  reticulate::as_iterator() %>%
   reticulate::iter_next()
}



SPOTMisc documentation built on Sept. 5, 2022, 5:06 p.m.