prune_vocabulary: Prune vocabulary

Description Usage Arguments See Also

Description

This function filters the input vocabulary and throws out very frequent and very infrequent terms. See examples in for the vocabulary function. The parameter vocab_term_max can also be used to limit the absolute size of the vocabulary to only the most frequently used terms.

Usage

1
2
3
prune_vocabulary(vocabulary, term_count_min = 1L, term_count_max = Inf,
  doc_proportion_min = 0, doc_proportion_max = 1, doc_count_min = 1L,
  doc_count_max = Inf, vocab_term_max = Inf)

Arguments

vocabulary

a vocabulary from the vocabulary function.

term_count_min

minimum number of occurences over all documents.

term_count_max

maximum number of occurences over all documents.

doc_proportion_min

minimum proportion of documents which should contain term.

doc_proportion_max

maximum proportion of documents which should contain term.

doc_count_min

term will be kept number of documents contain this term is larger than this value

doc_count_max

term will be kept number of documents contain this term is smaller than this value

vocab_term_max

maximum number of terms in vocabulary.

See Also

vocabulary


text2vec documentation built on March 26, 2020, 7:48 p.m.