tokenize_word: Tokenize a single "word" (no whitespace).
In jonathanbratt/RBERT: R Implementation of BERT

tokenize_word

R Documentation

Tokenize a single "word" (no whitespace).

Description

In BERT: tokenization.py, this code is inside the tokenize method for WordpieceTokenizer objects. I've moved it into its own function for clarity. Punctuation should already have been removed from the word.

Usage

tokenize_word(word, vocab, unk_token = "[UNK]", max_chars = 100)

Arguments

`word`	Word to tokenize.
`vocab`	Character vector containing vocabulary words
`unk_token`	Token to represent unknown words.
`max_chars`	Maximum length of word recognized.

Value

Input word as a list of tokens.

Examples

tokenize_word("unknown", vocab = c("un" = 0, "##known" = 1))
tokenize_word("known", vocab = c("un" = 0, "##known" = 1))

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.

jonathanbratt/RBERT index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jonathanbratt/RBERT
R Implementation of BERT

tokenize_word: Tokenize a single "word" (no whitespace).
In jonathanbratt/RBERT: R Implementation of BERT

Tokenize a single "word" (no whitespace).

Description

Usage

Arguments

Value

Examples

Related to tokenize_word in jonathanbratt/RBERT...

R Package Documentation

Browse R Packages

We want your feedback!

jonathanbratt/RBERT R Implementation of BERT

tokenize_word: Tokenize a single "word" (no whitespace). In jonathanbratt/RBERT: R Implementation of BERT

Tokenize a single "word" (no whitespace).

Description

Usage

Arguments

Value

Examples

Related to tokenize_word in jonathanbratt/RBERT...

R Package Documentation

Browse R Packages

We want your feedback!

jonathanbratt/RBERT
R Implementation of BERT

tokenize_word: Tokenize a single "word" (no whitespace).
In jonathanbratt/RBERT: R Implementation of BERT