tokens_trim | R Documentation |
Returns a tokens object reduced in size based on document and term frequency, usually in terms of a minimum frequency, but may also be in terms of maximum frequencies. Setting a combination of minimum and maximum frequencies will select features based on a range.
tokens_trim(
x,
min_termfreq = NULL,
max_termfreq = NULL,
termfreq_type = c("count", "prop", "rank", "quantile"),
min_docfreq = NULL,
max_docfreq = NULL,
docfreq_type = c("count", "prop", "rank", "quantile"),
padding = FALSE,
verbose = quanteda_options("verbose")
)
x |
a dfm object |
min_termfreq , max_termfreq |
minimum/maximum values of feature frequencies across all documents, below/above which features will be removed |
termfreq_type |
how |
min_docfreq , max_docfreq |
minimum/maximum values of a feature's document frequency, below/above which features will be removed |
docfreq_type |
specify how |
padding |
if |
verbose |
print messages |
A tokens object with reduced size.
dfm_trim()
toks <- tokens(data_corpus_inaugural)
# keep only words occurring >= 10 times and in >= 2 documents
tokens_trim(toks, min_termfreq = 10, min_docfreq = 2, padding = TRUE)
# keep only words occurring >= 10 times and no more than 90% of the documents
tokens_trim(toks, min_termfreq = 10, max_docfreq = 0.9, docfreq_type = "prop",
padding = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.