View source: R/preprocessing.R
Tokenizer | R Documentation |
Returns an object for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).
Tokenizer(num_words = NULL, filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE, split = " ")
num_words |
integer. None or int. Maximum number of words to work with. |
filters |
vector (or concatenation) of characters to filter out, such as punctuation. |
lower |
boolean. Whether to set the text to lowercase. |
split |
string. Separator for word splitting. |
Taylor B. Arnold, taylor.arnold@acm.org
Chollet, Francois. 2015. Keras: Deep Learning library for Theano and TensorFlow.
Other preprocessing: expand_dims
,
img_to_array
, load_img
,
one_hot
, pad_sequences
,
text_to_word_sequence
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.