step_categorical_column_with_vocabulary_file: Creates a categorical column with vocabulary file
In tfdatasets: Interface to 'TensorFlow' Datasets

step_categorical_column_with_vocabulary_file

R Documentation

Creates a categorical column with vocabulary file

Description

Use this function when the vocabulary of a categorical variable is written to a file.

Usage

step_categorical_column_with_vocabulary_file(
  spec,
  ...,
  vocabulary_file,
  vocabulary_size = NULL,
  dtype = tf$string,
  default_value = NULL,
  num_oov_buckets = 0L
)

Arguments

`spec`	A feature specification created with `feature_spec()`.
`...`	Comma separated list of variable names to apply the step. selectors can also be used.
`vocabulary_file`	The vocabulary file name.
`vocabulary_size`	Number of the elements in the vocabulary. This must be no greater than length of `vocabulary_file`, if less than length, later values are ignored. If None, it is set to the length of `vocabulary_file`.
`dtype`	The type of features. Only string and integer types are supported.
`default_value`	The integer ID value to return for out-of-vocabulary feature values, defaults to `-1`. This can not be specified with a positive `num_oov_buckets`.
`num_oov_buckets`	Non-negative integer, the number of out-of-vocabulary buckets. All out-of-vocabulary inputs will be assigned IDs in the range `⁠[vocabulary_size, vocabulary_size+num_oov_buckets)⁠` based on a hash of the input value. A positive `num_oov_buckets` can not be specified with default_value.

Value

a FeatureSpec object.

Examples

## Not run: 
library(tfdatasets)
data(hearts)
file <- tempfile()
writeLines(unique(hearts$thal), file)
hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32)

# use the formula interface
spec <- feature_spec(hearts, target ~ thal) %>%
  step_categorical_column_with_vocabulary_file(thal, vocabulary_file = file)

spec_fit <- fit(spec)
final_dataset <- hearts %>% dataset_use_spec(spec_fit)

## End(Not run)

tfdatasets documentation built on Sept. 11, 2024, 8:53 p.m.