docfreq_filter: Support function for subset method

Description Usage Arguments Examples

View source: R/subset_tcorpus.r

Description

Support function to enable subsetting by document frequency stats of a given feature. Should only be used within the tCorpus subset method, or any tCorpus method that supports a subset argument.

Usage

1
2
docfreq_filter(x, min = -Inf, max = Inf, top = NULL, bottom = NULL,
  doc_id = parent.frame()$doc_id)

Arguments

x

the name of the feature column. Can be given as a call or a string.

min

A number, setting the minimum document frequency value

max

A number, setting the maximum document frequency value

top

A number. If given, only the top x features with the highest document frequency are TRUE

bottom

A number. If given, only the bottom x features with the highest document frequency are TRUE

doc_id

Added for reference, but should not be used. Automatically takes doc_id from tCorpus if the docfreq_filter function is used within the subset method.

Examples

1
2
3
4
5
tc = create_tcorpus(c('a a a b b', 'a a c c'))

tc$get()
tc$subset(subset = docfreq_filter(token, min=2))
tc$get()

kasperwelbers/corpustools documentation built on Sept. 1, 2018, 1:03 p.m.