Tokenizer: Tokenizer

View source: R/preprocessing.R

TokenizerR Documentation

Tokenizer

Description

Returns an object for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).

Usage

Tokenizer(num_words = NULL,
  filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE,
  split = " ")

Arguments

num_words

integer. None or int. Maximum number of words to work with.

filters

vector (or concatenation) of characters to filter out, such as punctuation.

lower

boolean. Whether to set the text to lowercase.

split

string. Separator for word splitting.

Author(s)

Taylor B. Arnold, taylor.arnold@acm.org

References

Chollet, Francois. 2015. Keras: Deep Learning library for Theano and TensorFlow.

See Also

Other preprocessing: expand_dims, img_to_array, load_img, one_hot, pad_sequences, text_to_word_sequence


kerasR documentation built on Aug. 17, 2022, 5:06 p.m.