step_embedding_column: Creates embeddings columns
In rstudio/tfdatasets: Interface to 'TensorFlow' Datasets

step_embedding_column

R Documentation

Creates embeddings columns

Description

Use this step to create ambeddings columns from categorical columns.

Usage

step_embedding_column(
  spec,
  ...,
  dimension = function(x) {
     as.integer(x^0.25)
 },
  combiner = "mean",
  initializer = NULL,
  ckpt_to_load_from = NULL,
  tensor_name_in_ckpt = NULL,
  max_norm = NULL,
  trainable = TRUE
)

Arguments

`spec`	A feature specification created with `feature_spec()`.
`...`	Comma separated list of variable names to apply the step. selectors can also be used.
`dimension`	An integer specifying dimension of the embedding, must be > 0. Can also be a function of the size of the vocabulary.
`combiner`	A string specifying how to reduce if there are multiple entries in a single row. Currently 'mean', 'sqrtn' and 'sum' are supported, with 'mean' the default. 'sqrtn' often achieves good accuracy, in particular with bag-of-words columns. Each of this can be thought as example level normalizations on the column. For more information, see `tf.embedding_lookup_sparse`.
`initializer`	A variable initializer function to be used in embedding variable initialization. If not specified, defaults to `tf.truncated_normal_initializer` with mean `0.0` and standard deviation `1/sqrt(dimension)`.
`ckpt_to_load_from`	String representing checkpoint name/pattern from which to restore column weights. Required if `tensor_name_in_ckpt` is not `NULL`.
`tensor_name_in_ckpt`	Name of the Tensor in ckpt_to_load_from from which to restore the column weights. Required if `ckpt_to_load_from` is not `NULL`.
`max_norm`	If not `NULL`, embedding values are l2-normalized to this value.
`trainable`	Whether or not the embedding is trainable. Default is `TRUE`.

Value

a FeatureSpec object.

Examples

## Not run: 
library(tfdatasets)
data(hearts)
file <- tempfile()
writeLines(unique(hearts$thal), file)
hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32)

# use the formula interface
spec <- feature_spec(hearts, target ~ thal) %>%
  step_categorical_column_with_vocabulary_list(thal) %>%
  step_embedding_column(thal, dimension = 3)
spec_fit <- fit(spec)
final_dataset <- hearts %>% dataset_use_spec(spec_fit)

## End(Not run)

rstudio/tfdatasets documentation built on April 13, 2025, 6:50 p.m.