View source: R/tokens_sample.R
tokens_sample | R Documentation |
Take a random sample of documents of the specified size from a corpus, with or without replacement, optionally by grouping variables or with probability weights.
tokens_sample(
x,
size = NULL,
replace = FALSE,
prob = NULL,
by = NULL,
env = NULL
)
x |
a tokens object whose documents will be sampled |
size |
a positive number, the number of documents to select; when used
with |
replace |
if |
prob |
a vector of probability weights for obtaining the elements of the
vector being sampled. May not be applied when |
by |
optional grouping variable for sampling. This will be evaluated in
the docvars data.frame, so that docvars may be referred to by name without
quoting. This also changes previous behaviours for |
env |
an environment or a list object in which |
a tokens object (re)sampled on the documents, containing the document variables for the documents sampled.
sample
set.seed(123)
toks <- tokens(data_corpus_inaugural[1:6])
toks
tokens_sample(toks)
tokens_sample(toks, replace = TRUE) |> docnames()
tokens_sample(toks, size = 3, replace = TRUE) |> docnames()
# sampling using by
docvars(toks)
tokens_sample(toks, size = 2, replace = TRUE, by = Party) |> docnames()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.