create_corpus | R Documentation |
From raw corpus or part-frequency list, create a list object containing descriptive stats and indexed tokens.
create_corpus( tokens, parts, freq = NULL, vocab = NULL, doc_ids = NULL, type = c("per_part", "raw"), cutoff = 0L, with_distance = TRUE, no_match = c("fail", "remove", "keep") )
tokens |
character vector with tokens |
parts |
character vector |
freq |
integer optional vector with counts |
vocab |
character or factor optional vector with unique tokens |
doc_ids |
character or factor optional vector with part ids |
type |
input type, either "per_part" or "raw" |
cutoff |
integer minimum frequency for each type |
with_distance |
logical whether or not to calculate distances required for distance measures |
no_match |
character, "fail" (default): throws an error if tokens
contain NAs after creating an index. Typically, this happens when |
iparts
integer index of parts
l
number of tokens in the input
f
frequency per unique tokens
i
integer index of tokens per parts
j
integer index of parts per tokens
v
frequency of tokens per part
vocab
unique tokens
sort_ids
sorting permutation of tokens for use in distance based measures
sizes
sizes of parts
list of type "corpus"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.