cl_lexicon_size | R Documentation |
Get the total number of unique tokens/ids of a positional attribute. Note
that token ids are zero-based, i.e. when iterating through tokens, start at
0, the maximum will be cl_lexicon_size()
minus 1.
cl_lexicon_size(corpus, p_attribute, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus |
name of a CWB corpus (upper case) |
p_attribute |
name of positional attribute |
registry |
path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY |
lexicon_size <- cl_lexicon_size(
"REUTERS",
p_attribute = "word",
registry = get_tmp_registry()
)
token_ids <- seq.int(from = 0, to = lexicon_size - 1)
cl_id2str(
"REUTERS",
p_attribute = "word",
id = token_ids,
registry = get_tmp_registry()
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.