tif_is_tokens_list: Validate Tokens List Object

View source: R/validators.R

tif_is_tokens_listR Documentation

Validate Tokens List Object

Description

A valid corpus tokens object is (possibly named) list of character vectors. The character vectors, as well as names, should be in UTF-8 encoding. No other attributes should be present in either the list or any of its elements.

Usage

tif_is_tokens_list(tokens, warn = FALSE)

Arguments

tokens

a tokens object to test for validity

warn

logical. Should the function produce a verbose warning for the condition for which the validation fails. Useful for testing.

Details

The tests are run sequentially and the function returns, with a warning if the warn flag is set, on the first test that fails. We use this implementation because some tests may fail entirely or be meaningless if the prior ones are note passed.

Value

a logical vector of length one indicating whether the input is a valid tokens

Examples

tokens <- list(doc1 = c("aujourd'hui", "maman", "est", "morte"),
               doc2 = c("it", "was", "a", "pleasure", "to", "burn"),
               doc3 = c("all", "this", "happened", "more", "or", "less"))
tif_is_tokens_list(tokens)

names(tokens) <- c("doc1", "doc2", "doc3")
tif_is_tokens_list(tokens)

ropensci/tif documentation built on Nov. 30, 2023, 7:46 p.m.