Description Usage Arguments Details
View source: R/layer-text_vectorization.R
This layer has basic options for managing text in a Keras model. It transforms a batch of strings (one sample = one string) into either a list of token indices (one sample = 1D tensor of integer token indices) or a dense representation (one sample = 1D tensor of float values representing data about the sample's tokens).
1 2 3 4 |
object |
Model or layer object |
max_tokens |
The maximum size of the vocabulary for this layer. If |
standardize |
Optional specification for standardization to apply to the
input text. Values can be |
split |
Optional specification for splitting the input text. Values can be
|
ngrams |
Optional specification for ngrams to create from the possibly-split
input text. Values can be |
output_mode |
Optional specification for the output of the layer. Values can
be
|
output_sequence_length |
Only valid in "int" mode. If set, the output will have
its time dimension padded or truncated to exactly |
pad_to_max_tokens |
Only valid in "binary", "count", and "tfidf" modes. If |
... |
Not used. |
The processing of each sample contains the following steps:
standardize each sample (usually lowercasing + punctuation stripping)
split each sample into substrings (usually words)
recombine substrings into tokens (usually ngrams)
index tokens (associate a unique int value with each token)
transform each sample using this index, either into a vector of ints or a dense float vector.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.