View source: R/subset_tcorpus.r
docfreq_filter | R Documentation |
Support function to enable subsetting by document frequency stats of a given feature. Should only be used within the tCorpus subset method, or any tCorpus method that supports a subset argument.
docfreq_filter(
x,
min = -Inf,
max = Inf,
top = NULL,
bottom = NULL,
doc_id = parent.frame()$doc_id
)
x |
the name of the feature column. Can be given as a call or a string. |
min |
A number, setting the minimum document frequency value |
max |
A number, setting the maximum document frequency value |
top |
A number. If given, only the top x features with the highest document frequency are TRUE |
bottom |
A number. If given, only the bottom x features with the highest document frequency are TRUE |
doc_id |
Added for reference, but should not be used. Automatically takes doc_id from tCorpus if the docfreq_filter function is used within the subset method. |
tc = create_tcorpus(c('a a a b b', 'a a c c'))
tc$tokens
tc$subset(subset = docfreq_filter(token, min=2))
tc$tokens
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.