Description Usage Arguments Value Examples
View source: R/corpus_sample.R
Take a random sample of documents of the specified size from a corpus, with
or without replacement. Works just as sample()
works for the
documents and their associated document-level variables.
1 |
x |
a corpus object whose documents will be sampled |
size |
a positive number, the number of documents to select; when used with groups, the number to select from each group or a vector equal in length to the number of groups defining the samples to be chosen in each group category. By defining a size larger than the number of documents, it is possible to oversample groups. |
replace |
Should sampling be with replacement? |
prob |
A vector of probability weights for obtaining the elements of the
vector being sampled. May not be applied when |
by |
a grouping variable for sampling. Useful for resampling
sub-document units such as sentences, for instance by specifying |
A corpus object with number of documents equal to size
, drawn
from the corpus x
. The returned corpus object will contain all of
the meta-data of the original corpus, and the same document variables for
the documents selected.
1 2 3 4 5 6 7 8 9 10 11 | set.seed(2000)
# sampling from a corpus
summary(corpus_sample(data_corpus_inaugural, 5))
summary(corpus_sample(data_corpus_inaugural, 10, replace = TRUE))
# sampling sentences within document
corp <- corpus(c(one = "Sentence one. Sentence two. Third sentence.",
two = "First sentence, doc2. Second sentence, doc2."))
corpsent <- corpus_reshape(corp, to = "sentences")
texts(corpsent)
texts(corpus_sample(corpsent, replace = TRUE, by = "document"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.