token_text_encoder: Token Text Encoder

Description Usage Arguments See Also

View source: R/features_text.R

Description

Constructs a TokenTextEncoder.

Usage

1
2
3
4
5
6
7
8
9
token_text_encoder(
  vocab_list,
  oov_buckets = 1,
  oov_token = "UNK",
  lowercase = FALSE,
  tokenizer = NULL,
  strip_vocab = TRUE,
  decode_token_separator = " "
)

Arguments

vocab_list

list of tokens

oov_buckets

the number of integers to reserve for OOV hash buckets. Tokens that are OOV will be hash-modded into a OOV bucket in encode.

oov_token

the strings to use for OOV ids in decode.

lowercase

whether to make all text and tokens lowercase.

tokenizer

Tokenizer responsible for converting incoming text into a list of tokens.

strip_vocab

whether to strip whitespace from the beggining and end of elements of vocab_list.

decode_token_separator

the string used to separate tokens when decoding.

See Also

save_token_text_encoder(), load_token_text_encoder(), encode() and decode()


rstudio/tfds documentation built on Nov. 25, 2021, 6:20 p.m.