load_vocab: Load a vocabulary file

View source: R/tokenization.R

load_vocabR Documentation

Load a vocabulary file

Description

Load a vocabulary file

Usage

load_vocab(vocab_file)

Arguments

vocab_file

path to vocabulary file. File is assumed to be a text file, with one token per line, with the line number corresponding to the index of that token in the vocabulary.

Value

In the BERT Python code, the vocab is returned as an OrderedDict from the collections package. Here we return the vocab as a named integer vector. Names are tokens in vocabulary, values are integer indices.

Examples

## Not run: 
vocab <- load_vocab(vocab_file = "vocab.txt")

## End(Not run)

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.